RysOCR - Polish OCR LoRA for PaddleOCR-VL
A LoRA adapter fine-tuned on PaddleOCR-VL specifically for Polish text recognition, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).
Motivation
Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:
ą→aę→eł→lortó→o- etc.
This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.
Model Details
| Property | Value |
|---|---|
| Base Model | PaddlePaddle/PaddleOCR-VL |
| Method | LoRA (Low-Rank Adaptation) |
| LoRA Rank | 16 |
| LoRA Alpha | 32 |
| Target Modules | q_proj, k_proj, v_proj, o_proj |
| Training Framework | PEFT 0.18.0 + Transformers |
Usage
from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"PaddlePaddle/PaddleOCR-VL",
trust_remote_code=True,
torch_dtype="auto",
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")
processor = AutoProcessor.from_pretrained(
"anon13370/RysOCR",
trust_remote_code=True
)
# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "
inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}
outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)
Training Details
- Training Data: 10,000 synthetic Polish document images
- Categories: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
- Hardware: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
- Epochs: 1 epoch over full dataset
- Optimizer: AdamW with linear learning rate schedule
Baseline Performance (Pre-Fine-Tuning)
Baseline PaddleOCR-VL performance on Polish test set:
| Metric | Value |
|---|---|
| Character Error Rate (CER) | 5.58% |
| Word Error Rate (WER) | 13.37% |
| Exact Match | 74.00% |
| Diacritic Accuracy | 74.14% |
Improved version: Summary:
| Baseline | Fine-tuned | |
|---|---|---|
| CER | 5.58% | 1.60% |
| WER | 13.37% | 7.21% |
| Exact | 74% | 76% |
Key diacritic confusions in baseline:
łfrequently confused withlortęsometimes rendered aseśconfused withš
Limitations
- Optimized for printed Polish text; handwritten recognition may vary
- Best results on clean document scans; heavily degraded images may still have errors
- Inference requires loading both base model and LoRA weights
License
Apache 2.0 (same as base model)
Citation
If you use this model, please cite:
@misc{rysocr2024,
title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
author={Kacper Wikieł},
year={2024},
publisher={Hugging Face},
url={https://huggingface.co/anon13370/RysOCR}
}
- Downloads last month
- 126