Video-As-Prompt: Unified Semantic Control for Video Generation Paper • 2510.20888 • Published 6 days ago • 41
A Survey of Data Agents: Emerging Paradigm or Overstated Hype? Paper • 2510.23587 • Published 1 day ago • 59
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published 1 day ago • 106
Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations Paper • 2510.23607 • Published 1 day ago • 153
Advancing End-to-End Pixel Space Generative Modeling via Self-supervised Pre-training Paper • 2510.12586 • Published 15 days ago • 107
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published 20 days ago • 120
D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI Paper • 2510.05684 • Published 22 days ago • 133
Spatial Forcing: Implicit Spatial Representation Alignment for Vision-language-action Model Paper • 2510.12276 • Published 15 days ago • 142
Diffusion Transformers with Representation Autoencoders Paper • 2510.11690 • Published 16 days ago • 160
Ming-UniVision: Joint Image Understanding and Generation with a Unified Continuous Tokenizer Paper • 2510.06590 • Published 21 days ago • 70
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published 26 days ago • 93
MM-HELIX: Boosting Multimodal Long-Chain Reflective Reasoning with Holistic Platform and Adaptive Hybrid Policy Optimization Paper • 2510.08540 • Published 20 days ago • 108
Less is More: Recursive Reasoning with Tiny Networks Paper • 2510.04871 • Published 23 days ago • 455
Front-Loading Reasoning: The Synergy between Pretraining and Post-Training Data Paper • 2510.03264 • Published Sep 26 • 23
Bridging the Gap Between Promise and Performance for Microscaling FP4 Quantization Paper • 2509.23202 • Published Sep 27 • 26
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models Paper • 2509.26388 • Published 29 days ago • 26