GroupRank: A Groupwise Reranking Paradigm Driven by Reinforcement Learning Paper • 2511.11653 • Published Nov 10 • 55
CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models Paper • 2406.06007 • Published Jun 10, 2024 • 2
Democratizing Reasoning Ability: Tailored Learning from Large Language Model Paper • 2310.13332 • Published Oct 20, 2023 • 16
CREAM: Consistency Regularized Self-Rewarding Language Models Paper • 2410.12735 • Published Oct 16, 2024
MJ-VIDEO: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation Paper • 2502.01719 • Published Feb 3
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4 • 58