PixelRefer: A Unified Framework for Spatio-Temporal Object Referring with Arbitrary Granularity Paper • 2510.23603 • Published 7 days ago • 21
Scaling Language-Centric Omnimodal Representation Learning Paper • 2510.11693 • Published 21 days ago • 97
High-Fidelity Simulated Data Generation for Real-World Zero-Shot Robotic Manipulation Learning with Gaussian Splatting Paper • 2510.10637 • Published 22 days ago • 12
TIME: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenarios Paper • 2505.12891 • Published May 19 • 10
Residual Off-Policy RL for Finetuning Behavior Cloning Policies Paper • 2509.19301 • Published Sep 23 • 18
MMR1 Collection Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources • 1 item • Updated Sep 26 • 1
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25 • 101
RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation Paper • 2509.15212 • Published Sep 18 • 21
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 75
view article Article RynnEC: Bringing MLLMs into Embodied World By Alibaba-DAMO-Academy and 6 others • Aug 14 • 7
Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors Paper • 2508.08896 • Published Aug 12 • 10
view article Article SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data Jun 3 • 272
view article Article RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation By Alibaba-DAMO-Academy and 9 others • Aug 11 • 28
VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning Paper • 2507.22607 • Published Jul 30 • 46
MiroMind-M1: An Open-Source Advancement in Mathematical Reasoning via Context-Aware Multi-Stage Policy Optimization Paper • 2507.14683 • Published Jul 19 • 131
Through the Valley: Path to Effective Long CoT Training for Small Language Models Paper • 2506.07712 • Published Jun 9 • 18
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper • 2506.07044 • Published Jun 8 • 113
Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering Paper • 2505.23604 • Published May 29 • 23