Alchemist: Unlocking Efficiency in Text-to-Image Model Training via Meta-Gradient Data Selection Paper • 2512.16905 • Published 6 days ago • 30
DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning Paper • 2512.12799 • Published 10 days ago • 10
TokBench: Evaluating Your Visual Tokenizer before Visual Generation Paper • 2505.18142 • Published May 23 • 2
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 111
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9 • 58
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper • 2509.07969 • Published Sep 9 • 58
Mini-o3 Collection Scaling Up Reasoning Patterns and Interaction Turns for Visual Search • 7 items • Updated Sep 9 • 1
VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning Paper • 2507.13348 • Published Jul 17 • 77
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects Paper • 2407.16696 • Published Jul 23, 2024
Mini-o3 Collection Scaling Up Reasoning Patterns and Interaction Turns for Visual Search • 7 items • Updated Sep 9 • 1