view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch +5 May 21, 2025 • 247
gpt-oss Collection Open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. • 2 items • Updated Aug 7, 2025 • 398
V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning Paper • 2506.09985 • Published Jun 11, 2025 • 29
SciVer: Evaluating Foundation Models for Multimodal Scientific Claim Verification Paper • 2506.15569 • Published Jun 18, 2025 • 12
Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published May 30, 2025 • 80
VLM-R^3: Region Recognition, Reasoning, and Refinement for Enhanced Multimodal Chain-of-Thought Paper • 2505.16192 • Published May 22, 2025 • 12
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper • 2505.14640 • Published May 20, 2025 • 16
Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning Paper • 2505.15966 • Published May 21, 2025 • 53
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published May 20, 2025 • 52
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Paper • 2505.11049 • Published May 16, 2025 • 60
Understanding and Mitigating Toxicity in Image-Text Pretraining Datasets: A Case Study on LLaVA Paper • 2505.06356 • Published May 9, 2025 • 3
Aya Vision: Advancing the Frontier of Multilingual Multimodality Paper • 2505.08751 • Published May 13, 2025 • 13
Skywork-VL Reward: An Effective Reward Model for Multimodal Understanding and Reasoning Paper • 2505.07263 • Published May 12, 2025 • 30
MathCoder-VL: Bridging Vision and Code for Enhanced Multimodal Mathematical Reasoning Paper • 2505.10557 • Published May 15, 2025 • 47
Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection Paper • 2505.02393 • Published May 5, 2025 • 3
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges Paper • 2505.04769 • Published May 7, 2025 • 9