SpaceVista: All-Scale Visual Spatial Reasoning from mm to km Paper • 2510.09606 • Published 24 days ago • 17
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models Paper • 2507.23682 • Published Jul 31 • 23
ThinkSound: Chain-of-Thought Reasoning in Multimodal Large Language Models for Audio Generation and Editing Paper • 2506.21448 • Published Jun 26 • 8