The Case for End-to-End and Efficient Domain-Specific Vision-Language Models for OCR
Extract text from images and PDFs