Tiny QA Benchmark++: Ultra-Lightweight, Synthetic Multilingual Dataset Generation & Smoke-Tests for Continuous LLM Evaluation Paper โข 2505.12058 โข Published May 17, 2025 โข 6