π§ English to Spanish Translation AI Model
This repository contains a Transformer-based AI model fine-tuned for English to Spanish text translation. The model has been trained, quantized (FP16), and tested for quality and scoring. It delivers high-accuracy translations and is suitable for real-world use cases such as educational tools, real-time communication, and travel assistants.
π Features
- π Language Pair: English β Spanish
- π§ Model: Helsinki-NLP/opus-mt-en-es
- π§ͺ Quantized: FP16 for efficient inference
- π― High Accuracy: Scored well on validation sets
- β‘ CUDA Enabled: Fast training and inference
π Dataset Used
Hugging Face Dataset: OscarNav/spa-eng
- Source: OscarNav
- Language Pair:
en-es - Dataset Size: ~107K sentence pairs
from datasets import load_dataset
dataset = load_dataset("OscarNav/spa-eng", lang1="en", lang2="es")
π οΈ Model Training & Fine-Tuning
Pretrained Base Model: Helsinki-NLP/opus-mt-en-es
Tokenizer: AutoTokenizer from Hugging Face Transformers
Training Environment: Kaggle Notebook with CUDA GPU
Batch Size: 16
Epochs: 3β5 (based on early stopping)
Optimizer: AdamW
Loss Function: CrossEntropyLoss
π§ͺ Quantization (FP16)
Quantized the model for reduced memory usage and faster inference without compromising translation quality.
model = model.half()
model.save_pretrained("quantized_model_fp16")
β Scoring
BLEU Score: ~34+
Evaluation Metric: sacrebleu on validation set
Inference Accuracy: Verified using real-world sample sentences
- Downloads last month
- -