Spaces:

AIvry
/

MAPSS-measures

Sleeping

App Files Files Community

AIvry commited on Sep 14

Commit

4ee0d45

verified ·

1 Parent(s): f76a9ce

Upload 2 files

Browse files

Files changed (2) hide show

README.md +129 -7
requirements.txt +26 -0

README.md CHANGED Viewed

@@ -1,14 +1,136 @@
 ---
-title: MAPSS Measures
-emoji: 🔥
-colorFrom: green
-colorTo: indigo
 sdk: gradio
-sdk_version: 5.45.0
 app_file: app.py
 pinned: false
 license: mit
-short_description: Granular leakage and distortion metrics in source separation
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: MAPSS Multi Source Audio Perceptual Separation Scores
+emoji: 🎵
+colorFrom: blue
+colorTo: purple
 sdk: gradio
+sdk_version: 4.0.0
 app_file: app.py
 pinned: false
 license: mit
 ---
+# MAPSS: Multi-source Audio Perceptual Separation Scores
+Evaluate audio source separation quality using Perceptual Similarity (PS) and Perceptual Matching (PM) metrics.
+## Features
+- **Perceptual Similarity (PS)**: Measures how similar separated outputs are to reference sources in perceptual embedding space
+- **Perceptual Matching (PM)**: Evaluates robustness against a comprehensive set of audio distortions
+- **Multiple embedding models**: Support for WavLM, Wav2Vec2, HuBERT, AST, and more
+- **Automatic output-to-reference matching**: Uses correlation-based Hungarian algorithm
+- **GPU-optimized processing**: Efficient batch processing with memory management
+- **Diffusion maps**: Advanced dimensionality reduction for perceptual space analysis
+## Input Format
+Upload a ZIP file containing:
+```
+your_mixture.zip
+├── references/       # Original clean sources
+│   ├── speaker1.wav
+│   ├── speaker2.wav
+│   └── ...
+└── outputs/         # Separated outputs from your algorithm
+    ├── separated1.wav
+    ├── separated2.wav
+    └── ...
+```
+### Audio Requirements
+- Format: WAV files
+- Sample rate: Any (automatically resampled to 16kHz)
+- Channels: Mono or stereo (converted to mono)
+- Number of files: Equal number of references and outputs
+## Output Format
+The tool generates a ZIP file containing:
+- `ps_scores_{model}.csv`: PS scores for each speaker/source (0-1, higher is better)
+- `pm_scores_{model}.csv`: PM scores for each speaker/source (0-1, higher is better)
+- `params.json`: Experiment parameters used
+- `manifest_canonical.json`: File mapping and processing details
+### Score Interpretation
+- **PS Score**: Perceptual Similarity
+  - 1.0 = Perfect separation (output identical to reference)
+  - 0.5 = Moderate separation quality
+  - 0.0 = Poor separation (output closer to other sources)
+- **PM Score**: Perceptual Matching (robustness)
+  - 1.0 = Highly robust to distortions
+  - 0.5 = Moderate robustness
+  - 0.0 = Not robust (easily confused with distorted versions)
+## Available Models
+| Model | Description | Default Layer | Use Case |
+|-------|-------------|---------------|----------|
+| `raw` | Raw waveform features | N/A | Baseline comparison |
+| `wavlm` | WavLM Large | 24 | Best overall performance |
+| `wav2vec2` | Wav2Vec2 Large | 24 | Strong performance |
+| `hubert` | HuBERT Large | 24 | Good for speech |
+| `wavlm_base` | WavLM Base | 12 | Faster, good quality |
+| `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster processing |
+| `hubert_base` | HuBERT Base | 12 | Faster for speech |
+| `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual |
+| `ast` | Audio Spectrogram Transformer | 12 | General audio |
+## Parameters
+- **Model**: Select the embedding model for feature extraction
+- **Layer**: Which transformer layer to use (auto-selected by default)
+- **Alpha**: Diffusion maps parameter (0.0-1.0, default: 1.0)
+  - 0.0 = No normalization
+  - 1.0 = Full normalization (recommended)
+## How It Works
+1. **Feature Extraction**: Audio signals are processed through pre-trained self-supervised models to extract perceptual embeddings
+2. **Voice Activity Detection**: Automatic detection of voiced segments using energy-based masking
+3. **Diffusion Maps**: Embeddings are projected using diffusion maps for robust dimensionality reduction
+4. **PS Computation**: Measures Mahalanobis distance between separated outputs and references vs other sources
+5. **PM Computation**: Evaluates against comprehensive distortions including:
+   - Noise (white, pink, brown at various SNRs)
+   - Filtering (lowpass, highpass, notch, comb)
+   - Effects (reverb, echo, tremolo, vibrato)
+   - Distortions (clipping, pitch shift, time stretch)
+6. **Scoring**: Frame-level scores are computed and aggregated
+## Technical Details
+- **Loudness normalization**: ITU-R BS.1770 standard (-23 LUFS)
+- **Frame-based processing**: 20ms windows with 20ms hop
+- **Correlation-based assignment**: Hungarian algorithm for optimal matching
+- **Memory optimization**: Batch processing with automatic GPU memory management
+- **Robust statistics**: Covariance regularization and outlier handling
+## Citation
+If you use MAPSS in your research, please cite:
+```bibtex
+@article{mapss2024,
+  title={MAPSS: Multi-source Audio Perceptual Separation Scores},
+  author={Your Name},
+  journal={arXiv preprint},
+  year={2024}
+}
+```
+## Limitations
+- Processing time scales with audio length and model size
+- Memory requirements depend on number of sources and audio length
+- Currently optimized for speech separation (music separation support in development)
+- Maximum recommended sources: 10 per mixture
+## License
+Code: MIT License
+Paper: CC-BY-4.0
+## Support
+For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/yourusername/mapss).

requirements.txt ADDED Viewed

	@@ -0,0 +1,26 @@

+# Core dependencies
+gradio>=4.0.0
+torch>=2.0.0
+torchaudio>=2.0.0
+transformers>=4.35.0
+accelerate>=0.24.0
+# Audio processing
+librosa>=0.10.0
+soundfile>=0.12.0
+pyloudnorm>=0.1.0
+scipy>=1.11.0
+numpy>=1.24.0
+# Data handling
+pandas>=2.0.0
+# Model specific
+safetensors>=0.4.0
+sentencepiece>=0.1.99  # For some tokenizers
+# Optional optimizations
+triton>=2.1.0  # For faster attention if available
+# Memory management
+psutil>=5.9.0