32 10 41

Alex Chen PRO

alexchen4ai

https://alexchen4ai.github.io/blog/

AI & ML interests

NLP

Recent Activity

liked a model 8 days ago

NexaAI/Qwen3-VL-2B-Thinking-GGUF

liked a model 8 days ago

NexaAI/Qwen3-VL-2B-Instruct-GGUF

liked a model 14 days ago

Qwen/Qwen3-VL-8B-Instruct

View all activity

Organizations

liked 2 models 8 days ago

NexaAI/Qwen3-VL-2B-Thinking-GGUF

2B • Updated 2 days ago • 1.62k • 12

NexaAI/Qwen3-VL-2B-Instruct-GGUF

2B • Updated 2 days ago • 4.7k • 15

liked a model 14 days ago

Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated 15 days ago • 457k • • 362

liked a model 15 days ago

NexaAI/Qwen3-VL-4B-Instruct-GGUF

Image-Text-to-Text • 4B • Updated 2 days ago • 24.6k • 25

liked a model 23 days ago

neuphonic/neutts-air

Text-to-Speech • 0.7B • Updated 20 days ago • 39.8k • 698

liked a model 25 days ago

jinaai/jina-reranker-v2-base-multilingual

Text Ranking • 0.3B • Updated 9 days ago • 839k • 320

replied to yeonseok-zeticai's post about 1 month ago

Cool, we also depolyed yolo v12 and other generative ai model on NPU, try if you are interested. https://sdk.nexa.ai/model/YOLOv12%E2%80%91N

reacted to yeonseok-zeticai's post with 👍 about 1 month ago

Post

3391

YOLOv11 Complete On-device Study
- {NPU vs GPU vs CPU} Across All Model Variants

We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange.
Here's what every ML engineer needs to know.

📊 Key Findings Across 5 Model Variants (XL to Nano):

1. NPU Dominance in Efficiency:
- YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster)
- Memory footprint: 0-65MB across all variants
- Consistent sub-10ms inference even on XL models

2. The Sweet Spot - YOLOv11s:
- NPU: 3.23ms @ 95.57% mAP
- Perfect balance: 36MB model, production-ready speed
- 10x faster than GPU, 30x faster than CPU

3. Surprising Discovery:
Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.

4. Production Insights:
- XL/Large: GPU still competitive for batch processing
- Small/Nano: NPU absolutely crushes everything else
- Memory scaling: Linear from 10MB (Nano) to 217MB (XL)
- Accuracy plateau: 95.5-95.7% mAP across S/M/L variants

Real-world Impact:
For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.

🔗 Test these benchmarks yourself: https://mlange.zetic.ai/p/Steve/YOLOv11_comparison?tab=versions&version=5

📈 Full benchmark suite available now

The data speaks for itself.
NPUs aren't the future - they're the present for efficient inference.
Which variant fits your use case? Let's discuss in the comments.