Sanskrit LLMs
Collection
Projects I did related to make LLM better in Sanskrit
โข
10 items
โข
Updated
โข
2
This is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct specifically optimized for Sanskrit language processing. The model has been trained using LoRA (Low-Rank Adaptation) on a comprehensive Sanskrit dataset to excel in three key areas:
Sanskrit-transliteration-chat-dataset (vs. previous Sanskrit-llama)| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen2.5-7B-Instruct |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Sequence Length | 512 tokens |
| Training Epochs | 3 |
| Learning Rate | 2e-05 |
| Batch Size | 2 (micro) ร 4 (gradient accumulation) |
| Optimizer | AdamW 8-bit |
| Precision | bfloat16 |
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "diabolic6045/Sanskrit-qwen-7B-Translate-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Prepare the conversation
messages = [
{
"role": "system",
"content": "You are a Sanskrit transliteration expert. Convert the given Sanskrit text from Devanagari script to IAST (International Alphabet of Sanskrit Transliteration) format."
},
{
"role": "user",
"content": "Transliterate this Sanskrit text to IAST: เคฌเฅเคฆเฅเคงเคฟเคถเฅเคเคพเคฐเฅเคฅเคพเคคเฅเคชเคฐเฅ เคฒเฅเคญเค เคธเคจเฅเคคเฅเคทเค เคชเคฐเคฎเค เคธเฅเคเคฎเฅ เฅค"
}
]
# Apply chat template and generate
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# Output: buddhiลcฤrthฤtparo lobhaแธฅ santoแนฃaแธฅ paramaแน sukham |
messages = [
{
"role": "system",
"content": "You are a Sanskrit to English translation expert. Translate the given Sanskrit text accurately while preserving the meaning and context."
},
{
"role": "user",
"content": "Translate this Sanskrit text to English: เคฏเคฆเฅเคเฅเคจเฅ เคธเฅเคฐเฅเคฏเฅเฅ เคตเคฟเฅเคทเค เคชเฅเฅเคฅเคฟเฅเคตเฅเคฏเคพเคฎเฅเคทเฅเคงเฅเคทเฅเฅ เคฏเคคเฅ เฅค"
}
]
# Generate translation
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=150, temperature=0.7)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# Output: The poison that is in the sun, in the earth and in the herbs...
messages = [
{
"role": "system",
"content": "You are an English to Sanskrit translation expert. Translate the given English text accurately into Sanskrit while preserving the meaning and context."
},
{
"role": "user",
"content": "Translate this English text to Sanskrit: May the divine powers protect us and grant us wisdom."
}
]
# Generate Sanskrit translation
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, temperature=0.7)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)
# Output: เคฆเฅเคตเคพเค เค
เคธเฅเคฎเคพเคจเฅ เคฐเคเฅเคทเคจเฅเคคเฅ เคฌเฅเคฆเฅเคงเคฟเค เค เคชเฅเคฐเคฏเคเฅเคเคจเฅเคคเฅ เฅค
Try the model with our Gradio interface:
The demo provides:
diabolic6045/Sanskrit-transliteration-chat-dataset# Key training parameters
base_model: Qwen/Qwen2.5-7B-Instruct
adapter: lora
lora_r: 16
lora_alpha: 32
sequence_len: 512
num_epochs: 3
learning_rate: 0.00002
optimizer: adamw_8bit
lr_scheduler: cosine
bf16: auto
flash_attention: true
gradient_checkpointing: true
| Feature | Previous Model | Current Model |
|---|---|---|
| Base Model | Qwen2.5-7B-Instruct-1M | Qwen2.5-7B-Instruct |
| Dataset | Sanskrit-llama (Alpaca) | Sanskrit-transliteration-chat-dataset |
| Format | Alpaca format | Chat template format |
| Capabilities | Basic translation | Multi-modal (transliteration + translation) |
| LoRA Rank | 32 | 16 (optimized) |
| Sequence Length | 1024 | 512 (focused) |
| Training Epochs | 1 | 3 (more thorough) |
| Specialization | General Sanskrit | Specialized for transliteration |
If you use this model in your research, please cite:
@misc{sanskrit-qwen-chat-lora,
title={Sanskrit-qwen-7B-Translate-v2: A Specialized Sanskrit Translation and Transliteration Model},
author={Divax Shah (diabolic6045)},
year={2024},
url={https://huggingface.co/diabolic6045/Sanskrit-qwen-7B-Translate-v2}
}
We welcome contributions to improve this model:
This model is released under the Apache 2.0 License. See the LICENSE file for details.