REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published Nov 17, 2025 • 25
HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration Paper • 2510.27266 • Published Oct 31, 2025 • 20
Shuffle-R1: Efficient RL framework for Multimodal Large Language Models via Data-centric Dynamic Shuffle Paper • 2508.05612 • Published Aug 7, 2025 • 2