V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties Paper • 2512.11799 • Published 15 days ago • 29
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation Paper • 2512.03036 • Published 25 days ago • 20
Think Visually, Reason Textually: Vision-Language Synergy in ARC Paper • 2511.15703 • Published Nov 19 • 8
CapRL: Stimulating Dense Image Caption Capabilities via Reinforcement Learning Paper • 2509.22647 • Published Sep 26 • 32
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence Paper • 2510.24693 • Published Oct 28 • 18
Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning Paper • 2510.27606 • Published Oct 31 • 28
Running 6 Open LMM Spatial Leaderboard 🥇 6 A Leaderboard for LMM spatial understanding capabilities
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 122