FastVLM Collection Efficient Vision Encoding for Vision Language Models • 9 items • Updated Sep 2, 2025 • 106
MobileCLIP2 Collection MobileCLIP2: Mobile-friendly image-text models with SOTA zero-shot capabilities trained on DFNDR-2B • 37 items • Updated Sep 18, 2025 • 57
Nomic Embed Vision Collection Vision Encoders aligned to Nomic Embed Text making Nomic Embed multimodal! • 2 items • Updated Jun 5, 2024 • 10
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 147