RysOCR - Polish OCR LoRA for PaddleOCR-VL

A LoRA adapter fine-tuned on PaddleOCR-VL specifically for Polish text recognition, with emphasis on correct handling of Polish diacritics (ą, ć, ę, ł, ń, ó, ś, ź, ż).

Motivation

Polish is underrepresented in OCR training data. Most vision-language OCR models struggle with Polish diacritics, often substituting:

ą → a
ę → e
ł → l or t
ó → o
etc.

This model addresses that gap by fine-tuning on synthetic Polish document images covering addresses, invoices, receipts, names, and common phrases.

Model Details

Property	Value
Base Model	PaddlePaddle/PaddleOCR-VL
Method	LoRA (Low-Rank Adaptation)
LoRA Rank	16
LoRA Alpha	32
Target Modules	q_proj, k_proj, v_proj, o_proj
Training Framework	PEFT 0.18.0 + Transformers

Usage

from transformers import AutoModelForCausalLM, AutoProcessor
from peft import PeftModel
from PIL import Image

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "PaddlePaddle/PaddleOCR-VL",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "anon13370/RysOCR")

processor = AutoProcessor.from_pretrained(
    "anon13370/RysOCR",
    trust_remote_code=True
)

# Run inference
image = Image.open("your_document.png")
prompt = "OCR: "

inputs = processor(images=image, text=prompt, return_tensors="pt")
inputs = {k: v.to(model.device) for k, v in inputs.items()}

outputs = model.generate(**inputs, max_new_tokens=256)
text = processor.decode(outputs[0], skip_special_tokens=True)
print(text)

Training Details

Training Data: 10,000 synthetic Polish document images
Categories: Addresses, invoice lines, receipt lines, dates, names, prices, phrases
Hardware: Trained with LoRA to enable fine-tuning on consumer hardware (4-6GB VRAM)
Epochs: 1 epoch over full dataset
Optimizer: AdamW with linear learning rate schedule

Baseline Performance (Pre-Fine-Tuning)

Baseline PaddleOCR-VL performance on Polish test set:

Metric	Value
Character Error Rate (CER)	5.58%
Word Error Rate (WER)	13.37%
Exact Match	74.00%
Diacritic Accuracy	74.14%

Improved version: Summary:

	Baseline	Fine-tuned
CER	5.58%	1.60%
WER	13.37%	7.21%
Exact	74%	76%

Key diacritic confusions in baseline:

ł frequently confused with l or t
ę sometimes rendered as e
ś confused with š

Limitations

Optimized for printed Polish text; handwritten recognition may vary
Best results on clean document scans; heavily degraded images may still have errors
Inference requires loading both base model and LoRA weights

License

Apache 2.0 (same as base model)

Citation

If you use this model, please cite:

@misc{rysocr2024,
  title={RysOCR: Polish OCR LoRA for PaddleOCR-VL},
  author={Kacper Wikieł},
  year={2024},
  publisher={Hugging Face},
  url={https://huggingface.co/anon13370/RysOCR}
}

Downloads last month: 126

Model tree for kacperwikiel/RysOCR

Base model

baidu/ERNIE-4.5-0.3B-Paddle

Finetuned

PaddlePaddle/PaddleOCR-VL

Adapter

(6)

this model