`all-mpnet-base-v2` Complete On-device AI Study

#43

by yeonseok-zeticai - opened 4 days ago

4 days ago

Comprehensive mobile deployment study of sentence-transformers/all-mpnet-base-v2 with performance benchmarks proving SOTA embeddings are now deployment-ready.

🎯 Model Overview:

Architecture: MPNet (Masked Permuted Language Model)
Output: 768-dimensional dense embeddings
Specialty: Sentence similarity, semantic search, clustering
Advantage: Superior semantic understanding vs 384D alternatives

📊 Mobile Performance Results:
(1) Latency Metrics:

NPU (Best): 2.36ms average inference
GPU: 8.46ms average
CPU: 16.08ms average
NPU Advantage: 20.25x speedup over CPU baseline

(2) Memory Efficiency:

Model Size: 418.04 MB (production-optimized)
Runtime Memory: 348.02 MB peak consumption
Load Range: 4-677 MB across device categories
Inference Range: 332-673 MB

(3) Accuracy Preservation:

FP16 Precision: 31.14 dB maintained
Quantized Mode: Available for memory constraints
Embedding Quality: Production-grade semantic matching

🚀 Why all-mpnet-base-v2?
(1) Architecture Advantages:

Combines strengths of BERT (bidirectional) and XLNet (permuted LM)
768-dimensional embeddings for richer representations
Better performance on complex semantic tasks
Industry-standard for high-quality sentence embeddings

(2) Comparison to Alternatives:

vs all-MiniLM-L6-v2: 2x dimensions (768 vs 384), better quality
vs sentence-t5: Faster inference, similar quality
vs paraphrase models: More general-purpose, better balance

🔗 Resources:

Complete Study: https://mlange.zetic.ai/p/Steve/all-mpnet-base-v2

yeonseok-zeticai changed discussion title from `all-mpnet-base-v2` Complete On-device AI Study on mobile to `all-mpnet-base-v2` Complete On-device AI Study 4 days ago

yeonseok-zeticai

4 days ago

On-device AI Integration Steps for it

Install SDK
- Android: Add Gradle dependency
- iOS: Add Swift Package via SPM
Initialize Model
- Provide your ZETIC.MLange API key
- Specify model name: "Steve/all-mpnet-base-v2"
Generate Embeddings
- Pass text as string array
- Receive 768D float vectors
Compute Similarities
- Use cosine similarity for semantic matching
- Implement approximate nearest neighbor for large-scale search

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment