Parameter-Inverted Image Pyramid Networks for Visual Perception and Multimodal Understanding Paper • 2501.07783 • Published Jan 14, 2025 • 8
TIP: Tabular-Image Pre-training for Multimodal Classification with Incomplete Data Paper • 2407.07582 • Published Jul 10, 2024 • 1
CLS-RL: Image Classification with Rule-Based Reinforcement Learning Paper • 2503.16188 • Published Mar 20, 2025 • 13
RoMa v2: Harder Better Faster Denser Feature Matching Paper • 2511.15706 • Published Nov 19, 2025 • 7
TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding Paper • 2511.16595 • Published Nov 20, 2025 • 9
Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published Nov 20, 2025 • 15
SRPO: Self-Referential Policy Optimization for Vision-Language-Action Models Paper • 2511.15605 • Published Nov 19, 2025 • 22