ViCO: A Training Strategy towards Semantic Aware Dynamic High-Resolution Paper • 2510.12793 • Published 14 days ago • 2
InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models Paper • 2510.11341 • Published 16 days ago • 33
Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning Paper • 2510.11027 • Published 16 days ago • 19
NaViL: Rethinking Scaling Properties of Native Multimodal Large Language Models under Data Constraints Paper • 2510.08565 • Published 19 days ago • 19
OpenGVLab/InternVL3_5-1B-Flash Image-Text-to-Text • 1B • Updated about 1 month ago • 1.1k • 3
OpenGVLab/InternVL3_5-2B-Flash Image-Text-to-Text • 2B • Updated about 1 month ago • 3.71k • 3
OpenGVLab/InternVL3_5-14B-Flash Image-Text-to-Text • 15B • Updated about 1 month ago • 464 • 5
OpenGVLab/InternVL3_5-30B-A3B-Flash Image-Text-to-Text • 31B • Updated about 1 month ago • 2.49k • 5
OpenGVLab/InternVL3_5-241B-A28B-Flash Image-Text-to-Text • 242B • Updated about 1 month ago • 127 • 4
InternVL3.5-Flash Collection InternVL3.5-Flash is a fast variant of InternVL3.5 using semantic aware dynamic high-resolution strategy. • 9 items • Updated 14 days ago • 6
OpenGVLab/InternVL3_5-38B-Flash Image-Text-to-Text • 40B • Updated about 1 month ago • 199 • 5
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe Paper • 2509.18154 • Published Sep 16 • 49
Qwen/Qwen3-VL-235B-A22B-Thinking Image-Text-to-Text • 236B • Updated 25 days ago • 26.2k • • 309
Qwen/Qwen3-VL-235B-A22B-Instruct Image-Text-to-Text • 236B • Updated 25 days ago • 72.8k • • 305
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18 • 109