Vision - a melvindave Collection

melvindave 's Collections

Vision

Papers

Language Models (Reasoning)

Audio Transcription

Image Generation

Fine-tuning Models

Coding

Customer Conversations Datasets

Vision

updated 15 days ago

Running on CPU Upgrade

952

Open VLM Leaderboard

🌎

952

VLMEvalKit Evaluation Results Collection
Running on Zero

Featured

306

DeepSeek OCR Demo

🚀

306

Try out DeepSeek-OCR on your PDFs or images
Running on Zero

MCP

53

Multimodal OCR3

🌖

53

nanonets2-ocr / chandra-ocr / dots.ocr / olm-ocr2
Qwen/Qwen3-VL-30B-A3B-Instruct

Image-Text-to-Text • 31B • Updated 29 days ago • 1.67M • • 463

Note running locally in lmstudio
Qwen/Qwen3-VL-235B-A22B-Thinking

Image-Text-to-Text • 236B • Updated 29 days ago • 25k • • 351

Note inference available
Qwen/Qwen3-VL-235B-A22B-Instruct

Image-Text-to-Text • 236B • Updated 29 days ago • 244k • • 341

Note inference available
Qwen/Qwen2.5-VL-7B-Instruct

Image-Text-to-Text • 8B • Updated Apr 6 • 2.64M • • 1.41k
zai-org/GLM-4.6V

Image-Text-to-Text • 108B • Updated 17 days ago • 94.4k • • 341
Running on Zero

Featured

111

VLM Object Understanding

🦀

111

Explore object detection, visual grounding, keypoint Detecti