RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published Nov 27, 2025 • 14