StreamingVLM: Real-Time Understanding for Infinite Video Streams Paper • 2510.09608 • Published 19 days ago • 49
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published 23 days ago • 108
Code2Video: A Code-centric Paradigm for Educational Video Generation Paper • 2510.01174 • Published 28 days ago • 33
Robix: A Unified Model for Robot Interaction, Reasoning and Planning Paper • 2509.01106 • Published Sep 1 • 48
Draw-In-Mind: Learning Precise Image Editing via Chain-of-Thought Imagination Paper • 2509.01986 • Published Sep 2 • 4
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning Paper • 2509.02544 • Published Sep 2 • 123
HumanOmniV2: From Understanding to Omni-Modal Reasoning with Context Paper • 2506.21277 • Published Jun 26 • 15
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 43
UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning Paper • 2505.23380 • Published May 29 • 22
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 107
Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models Paper • 2505.16854 • Published May 22 • 11
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via Reinforcement Learning Paper • 2505.11896 • Published May 17 • 58