RoboVIP: Multi-View Video Generation with Visual Identity Prompting Augments Robot Manipulation Paper • 2601.05241 • Published 2 days ago • 21
MesaTask: Towards Task-Driven Tabletop Scene Generation via 3D Spatial Reasoning Paper • 2509.22281 • Published Sep 26, 2025 • 32
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 139
SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent Paper • 2509.20414 • Published Sep 24, 2025 • 9
Hunyuan3D-Omni: A Unified Framework for Controllable Generation of 3D Assets Paper • 2509.21245 • Published Sep 25, 2025 • 39
PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Paper • 2509.20358 • Published Sep 24, 2025 • 14
How Far are VLMs from Visual Spatial Intelligence? A Benchmark-Driven Perspective Paper • 2509.18905 • Published Sep 23, 2025 • 29
Hunyuan3D Studio: End-to-End AI Pipeline for Game-Ready 3D Asset Generation Paper • 2509.12815 • Published Sep 16, 2025 • 40