`all-mpnet-base-v2` Complete On-device AI Study
#43
by
yeonseok-zeticai
- opened
Comprehensive mobile deployment study of sentence-transformers/all-mpnet-base-v2 with performance benchmarks proving SOTA embeddings are now deployment-ready.
🎯 Model Overview:
- Architecture: MPNet (Masked Permuted Language Model)
- Output: 768-dimensional dense embeddings
- Specialty: Sentence similarity, semantic search, clustering
- Advantage: Superior semantic understanding vs 384D alternatives
📊 Mobile Performance Results:
(1) Latency Metrics:
- NPU (Best): 2.36ms average inference
- GPU: 8.46ms average
- CPU: 16.08ms average
- NPU Advantage: 20.25x speedup over CPU baseline
(2) Memory Efficiency:
- Model Size: 418.04 MB (production-optimized)
- Runtime Memory: 348.02 MB peak consumption
- Load Range: 4-677 MB across device categories
- Inference Range: 332-673 MB
(3) Accuracy Preservation:
- FP16 Precision: 31.14 dB maintained
- Quantized Mode: Available for memory constraints
- Embedding Quality: Production-grade semantic matching
🚀 Why all-mpnet-base-v2?
(1) Architecture Advantages:
- Combines strengths of BERT (bidirectional) and XLNet (permuted LM)
- 768-dimensional embeddings for richer representations
- Better performance on complex semantic tasks
- Industry-standard for high-quality sentence embeddings
(2) Comparison to Alternatives:
- vs all-MiniLM-L6-v2: 2x dimensions (768 vs 384), better quality
- vs sentence-t5: Faster inference, similar quality
- vs paraphrase models: More general-purpose, better balance
🔗 Resources:
Complete Study: https://mlange.zetic.ai/p/Steve/all-mpnet-base-v2
yeonseok-zeticai
changed discussion title from
`all-mpnet-base-v2` Complete On-device AI Study on mobile
to `all-mpnet-base-v2` Complete On-device AI Study
On-device AI Integration Steps for it
Install SDK
- Android: Add Gradle dependency
- iOS: Add Swift Package via SPM
Initialize Model
- Provide your ZETIC.MLange API key
- Specify model name:
"Steve/all-mpnet-base-v2"
Generate Embeddings
- Pass text as string array
- Receive 768D float vectors
Compute Similarities
- Use cosine similarity for semantic matching
- Implement approximate nearest neighbor for large-scale search





