NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX
Quickstart
Run them directly with nexa-sdk installed In nexa-sdk CLI:
NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX
Overview
In the past five months since Qwen2-VL’s release, numerous developers have built new models on the Qwen2-VL vision-language models, providing us with valuable feedback. During this period, we focused on building more useful vision-language models. Today, we are excited to introduce the latest addition to the Qwen family: Qwen2.5-VL.
Key Enhancements:
- Understand things visually: Qwen2.5-VL is not only proficient in recognizing common objects such as flowers, birds, fish, and insects, but it is highly capable of analyzing texts, charts, icons, graphics, and layouts within images. 
- Being agentic: Qwen2.5-VL directly plays as a visual agent that can reason and dynamically direct tools, which is capable of computer use and phone use. 
- Understanding long videos and capturing events: Qwen2.5-VL can comprehend videos of over 1 hour, and this time it has a new ability of cpaturing event by pinpointing the relevant video segments. 
- Capable of visual localization in different formats: Qwen2.5-VL can accurately localize objects in an image by generating bounding boxes or points, and it can provide stable JSON outputs for coordinates and attributes. 
- Generating structured outputs: for data like scans of invoices, forms, tables, etc. Qwen2.5-VL supports structured outputs of their contents, benefiting usages in finance, commerce, etc. 
Benchmark Results
Image benchmark
| Benchmark | InternVL2.5-8B | MiniCPM-o 2.6 | GPT-4o-mini | Qwen2-VL-7B | Qwen2.5-VL-7B | 
|---|---|---|---|---|---|
| MMMUval | 56 | 50.4 | 60 | 54.1 | 58.6 | 
| MMMU-Proval | 34.3 | - | 37.6 | 30.5 | 41.0 | 
| DocVQAtest | 93 | 93 | - | 94.5 | 95.7 | 
| InfoVQAtest | 77.6 | - | - | 76.5 | 82.6 | 
| ChartQAtest | 84.8 | - | - | 83.0 | 87.3 | 
| TextVQAval | 79.1 | 80.1 | - | 84.3 | 84.9 | 
| OCRBench | 822 | 852 | 785 | 845 | 864 | 
| CC_OCR | 57.7 | 61.6 | 77.8 | ||
| MMStar | 62.8 | 60.7 | 63.9 | ||
| MMBench-V1.1-Entest | 79.4 | 78.0 | 76.0 | 80.7 | 82.6 | 
| MMT-Benchtest | - | - | - | 63.7 | 63.6 | 
| MMStar | 61.5 | 57.5 | 54.8 | 60.7 | 63.9 | 
| MMVetGPT-4-Turbo | 54.2 | 60.0 | 66.9 | 62.0 | 67.1 | 
| HallBenchavg | 45.2 | 48.1 | 46.1 | 50.6 | 52.9 | 
| MathVistatestmini | 58.3 | 60.6 | 52.4 | 58.2 | 68.2 | 
| MathVision | - | - | - | 16.3 | 25.07 | 
Video Benchmarks
| Benchmark | Qwen2-VL-7B | Qwen2.5-VL-7B | 
|---|---|---|
| MVBench | 67.0 | 69.6 | 
| PerceptionTesttest | 66.9 | 70.5 | 
| Video-MMEwo/w subs | 63.3/69.0 | 65.1/71.6 | 
| LVBench | 45.3 | |
| LongVideoBench | 54.7 | |
| MMBench-Video | 1.44 | 1.79 | 
| TempCompass | 71.7 | |
| MLVU | 70.2 | |
| CharadesSTA/mIoU | 43.6 | 
Agent benchmark
| Benchmarks | Qwen2.5-VL-7B | 
|---|---|
| ScreenSpot | 84.7 | 
| ScreenSpot Pro | 29.0 | 
| AITZ_EM | 81.9 | 
| Android Control High_EM | 60.1 | 
| Android Control Low_EM | 93.7 | 
| AndroidWorld_SR | 25.5 | 
| MobileMiniWob++_SR | 91.4 | 
Reference
Original model card: Qwen/Qwen2.5-VL-7B-Instruct
- Downloads last month
- 17
Model tree for NexaAI/Qwen2.5-VL-7B-Instruct-4bit-MLX
Base model
Qwen/Qwen2.5-VL-7B-Instruct