`all-mpnet-base-v2` Complete On-device AI Study

#43
by yeonseok-zeticai - opened

Comprehensive mobile deployment study of sentence-transformers/all-mpnet-base-v2 with performance benchmarks proving SOTA embeddings are now deployment-ready.

🎯 Model Overview:

  • Architecture: MPNet (Masked Permuted Language Model)
  • Output: 768-dimensional dense embeddings
  • Specialty: Sentence similarity, semantic search, clustering
  • Advantage: Superior semantic understanding vs 384D alternatives

📊 Mobile Performance Results:
(1) Latency Metrics:

  • NPU (Best): 2.36ms average inference
  • GPU: 8.46ms average
  • CPU: 16.08ms average
  • NPU Advantage: 20.25x speedup over CPU baseline

(2) Memory Efficiency:

  • Model Size: 418.04 MB (production-optimized)
  • Runtime Memory: 348.02 MB peak consumption
  • Load Range: 4-677 MB across device categories
  • Inference Range: 332-673 MB

(3) Accuracy Preservation:

  • FP16 Precision: 31.14 dB maintained
  • Quantized Mode: Available for memory constraints
  • Embedding Quality: Production-grade semantic matching

🚀 Why all-mpnet-base-v2?
(1) Architecture Advantages:

  • Combines strengths of BERT (bidirectional) and XLNet (permuted LM)
  • 768-dimensional embeddings for richer representations
  • Better performance on complex semantic tasks
  • Industry-standard for high-quality sentence embeddings

(2) Comparison to Alternatives:

  • vs all-MiniLM-L6-v2: 2x dimensions (768 vs 384), better quality
  • vs sentence-t5: Faster inference, similar quality
  • vs paraphrase models: More general-purpose, better balance

🔗 Resources:

Complete Study: https://mlange.zetic.ai/p/Steve/all-mpnet-base-v2

Screenshot 2025-10-25 at 4.30.08 PM
Screenshot 2025-10-25 at 4.30.12 PM
Screenshot 2025-10-25 at 4.30.28 PM
Screenshot 2025-10-25 at 4.30.32 PM
Screenshot 2025-10-25 at 4.31.24 PM
Screenshot 2025-10-25 at 4.31.30 PM

yeonseok-zeticai changed discussion title from `all-mpnet-base-v2` Complete On-device AI Study on mobile to `all-mpnet-base-v2` Complete On-device AI Study

On-device AI Integration Steps for it

  1. Install SDK

    • Android: Add Gradle dependency
    • iOS: Add Swift Package via SPM
  2. Initialize Model

    • Provide your ZETIC.MLange API key
    • Specify model name: "Steve/all-mpnet-base-v2"
  3. Generate Embeddings

    • Pass text as string array
    • Receive 768D float vectors
  4. Compute Similarities

    • Use cosine similarity for semantic matching
    • Implement approximate nearest neighbor for large-scale search

Sign up or log in to comment