How well do VLM-based OCR models handle Victorian theatre playbills? 🎭 Last week I shared OCR Time Capsule for comparing traditional vs VLM-based OCR. I've now added some examples from challenging collections: The British Library's Theatrical playbills from Britain and Ireland collection. These 150-year-old documents are brutal for OCR: - Decorative fonts in every size imaginable - Multi-column layouts with text at odd angles - Faded ink and show-through from the reverse - ALL CAPS DRAMATIC ANNOUNCEMENTS!!! For this dataset I used the RolmOCR model from Reducto (processed via HF Jobs - love how easy UV scripts make GPU inference!). The results? The improvements over traditional OCR are even more dramatic than with exam papers. 🔗 Explore the app: https://huggingface.co/spaces/davanstrien/ocr-time-capsule 📚 BL Theatre dataset: https://bl.iro.bl.uk/concern/datasets/a8534aff-c8e3-4fc8-adc1-da542080b1e3 I'll continue to work through the suggestions I got last week but feel free to suggest other hairy OCR challenges to compare VLMs vs existing OCR! #DigitalHumanities #OCR #GLAM #BritishLibrary #TheatreHistory