First Persian SLM
Collection
The first Persian SLM Collection
•
4 items
•
Updated
•
1
WARNINNGS: This Model IS Pre-Trained, in the future will be finetuned.
The First Persian SLM By YSNRFD (YASIN ARYANFARD) and AMIRHOSSEIN MEHRDOOST, This Model support Only Persian text Inputs, In The Future I Want Add Englih Language Support.
ysnrfd Sample Persian Text LINK: https://huggingface.co/datasets/ysn-rfd/fibonacci_alpaca_to_sharegpt_gpt_format_convert_new_dataset_release
Not Yet
ysnrfd Sample Persian Text
Not Yet
The Fisrt Persian SLM Trained From Scratch
YSNRFD Architecture
Nvidia Tesla T4
Python Code, From Scratch, Pytorch
import torch
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import re
model_path = "./Path_To_Model"
print(f"{model_path}...")
tokenizer = GPT2Tokenizer.from_pretrained(model_path)
model = GPT2LMHeadModel.from_pretrained(model_path)
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"{device}")
model.to(device)
model.eval()
try:
end_token_id = tokenizer.convert_tokens_to_ids("### End")
if end_token_id == tokenizer.unk_token_id:
end_token_id = None
except:
end_token_id = None
print("\n------+------+------+------ model is ready for testing +------+------++------\n")
print("type exit for exit")
print("\nYSNRFD")
print("------+------+------+----------------------+------+------++------\n")
while True:
user_input = input("\nYou: ").strip()
if user_input.lower() in ["exit"]:
print("\n good bye (ysnrfd)")
break
prompt = f"### Human: {user_input}\n### Assistant:"
inputs = tokenizer(
prompt,
return_tensors="pt",
truncation=True,
max_length=512
).to(device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=400,
temperature=0.7,
top_p=0.9,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
eos_token_id=end_token_id or tokenizer.eos_token_id,
repetition_penalty=1.05
)
full_response = tokenizer.decode(outputs[0], skip_special_tokens=False)
assistant_response = full_response[len(prompt):].strip()
if "### End" in assistant_response:
assistant_response = assistant_response.split("### End")[0].strip()
assistant_response = re.sub(r'^###\s*Assistant:\s*', '', assistant_response)
if assistant_response:
print(f"\nBot: {assistant_response}")
else:
print("\nBot: please again say")