Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published 9 days ago • 79
GoRL: An Algorithm-Agnostic Framework for Online Reinforcement Learning with Generative Policies Paper • 2512.02581 • Published 26 days ago • 14
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward Paper • 2511.20561 • Published Nov 25 • 31
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems Paper • 2508.01415 • Published Aug 2 • 7
ThinkAct: Vision-Language-Action Reasoning via Reinforced Visual Latent Planning Paper • 2507.16815 • Published Jul 22 • 40
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data +7 Jun 3 • 299
RoboBrain: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete Paper • 2502.21257 • Published Feb 28 • 2
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 306
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 301
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments Paper • 2503.00729 • Published Mar 2 • 3
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language Models via Mixture-of-LoRAs Paper • 2503.01743 • Published Mar 3 • 89
DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References Paper • 2502.09614 • Published Feb 13 • 9
STMA: A Spatio-Temporal Memory Agent for Long-Horizon Embodied Task Planning Paper • 2502.10177 • Published Feb 14 • 6
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated Jul 21 • 550
MSI-Agent: Incorporating Multi-Scale Insight into Embodied Agents for Superior Planning and Decision-Making Paper • 2409.16686 • Published Sep 25, 2024 • 10