YOLOv11 Complete On-device Study - {NPU vs GPU vs CPU} Across All Model Variants
We've just completed comprehensive benchmarking of the entire YOLOv11 family on ZETIC.MLange. Here's what every ML engineer needs to know.
š Key Findings Across 5 Model Variants (XL to Nano):
1. NPU Dominance in Efficiency: - YOLOv11n: 1.72ms on NPU vs 53.60ms on CPU (31x faster) - Memory footprint: 0-65MB across all variants - Consistent sub-10ms inference even on XL models
2. The Sweet Spot - YOLOv11s: - NPU: 3.23ms @ 95.57% mAP - Perfect balance: 36MB model, production-ready speed - 10x faster than GPU, 30x faster than CPU
3. Surprising Discovery: Medium models (YOLOv11m) show unusual GPU performance patterns - NPU outperforms GPU by 4x (9.55ms vs 35.82ms), suggesting current GPU kernels aren't optimized for mid-size architectures.
4. Production Insights: - XL/Large: GPU still competitive for batch processing - Small/Nano: NPU absolutely crushes everything else - Memory scaling: Linear from 10MB (Nano) to 217MB (XL) - Accuracy plateau: 95.5-95.7% mAP across S/M/L variants
Real-world Impact: For edge deployment, YOLOv11s on NPU delivers server-level accuracy at embedded speeds. This changes everything for real-time applications.
The data speaks for itself. NPUs aren't the future - they're the present for efficient inference. Which variant fits your use case? Let's discuss in the comments.