OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published Dec 2, 2025 • 32
VisMem: Latent Vision Memory Unlocks Potential of Vision-Language Models Paper • 2511.11007 • Published Nov 14, 2025 • 15
Think with 3D: Geometric Imagination Grounded Spatial Reasoning from Limited Views Paper • 2510.18632 • Published Oct 21, 2025 • 21
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning Paper • 2508.06259 • Published Aug 8, 2025 • 2
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning Paper • 2503.07523 • Published Mar 10, 2025 • 1
Visual Document Understanding and Question Answering: A Multi-Agent Collaboration Framework with Test-Time Scaling Paper • 2508.03404 • Published Aug 5, 2025 • 4
RLFR: Extending Reinforcement Learning for LLMs with Flow Environment Paper • 2510.10201 • Published Oct 11, 2025 • 35
Visual Multi-Agent System: Mitigating Hallucination Snowballing via Visual Flow Paper • 2509.21789 • Published Sep 26, 2025 • 9
Qwen2-VL Collection Vision-language model series based on Qwen2 • 16 items • Updated 4 days ago • 227