Spaces:
Running
on
Zero
Running
on
Zero
Update README.md
Browse files
README.md
CHANGED
|
@@ -10,18 +10,11 @@ pinned: false
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
-
# MAPSS:
|
| 14 |
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
- **Perceptual Similarity (PS)**: Measures how similar separated outputs are to reference sources in perceptual embedding space
|
| 20 |
-
- **Perceptual Matching (PM)**: Evaluates robustness against a comprehensive set of audio distortions
|
| 21 |
-
- **Multiple embedding models**: Support for WavLM, Wav2Vec2, HuBERT, AST, and more
|
| 22 |
-
- **Automatic output-to-reference matching**: Uses correlation-based Hungarian algorithm
|
| 23 |
-
- **GPU-optimized processing**: Efficient batch processing with memory management
|
| 24 |
-
- **Diffusion maps**: Advanced dimensionality reduction for perceptual space analysis
|
| 25 |
|
| 26 |
## Input Format
|
| 27 |
|
|
@@ -47,22 +40,11 @@ your_mixture.zip
|
|
| 47 |
## Output Format
|
| 48 |
|
| 49 |
The tool generates a ZIP file containing:
|
| 50 |
-
- `ps_scores_{model}.csv`: PS scores for each speaker/source
|
| 51 |
-
- `pm_scores_{model}.csv`: PM scores for each speaker/source
|
| 52 |
- `params.json`: Experiment parameters used
|
| 53 |
- `manifest_canonical.json`: File mapping and processing details
|
| 54 |
|
| 55 |
-
### Score Interpretation
|
| 56 |
-
- **PS Score**: Perceptual Similarity
|
| 57 |
-
- 1.0 = Perfect separation (output identical to reference)
|
| 58 |
-
- 0.5 = Moderate separation quality
|
| 59 |
-
- 0.0 = Poor separation (output closer to other sources)
|
| 60 |
-
|
| 61 |
-
- **PM Score**: Perceptual Matching (robustness)
|
| 62 |
-
- 1.0 = Highly robust to distortions
|
| 63 |
-
- 0.5 = Moderate robustness
|
| 64 |
-
- 0.0 = Not robust (easily confused with distorted versions)
|
| 65 |
-
|
| 66 |
## Available Models
|
| 67 |
|
| 68 |
| Model | Description | Default Layer | Use Case |
|
|
@@ -85,27 +67,6 @@ The tool generates a ZIP file containing:
|
|
| 85 |
- 0.0 = No normalization
|
| 86 |
- 1.0 = Full normalization (recommended)
|
| 87 |
|
| 88 |
-
## How It Works
|
| 89 |
-
|
| 90 |
-
1. **Feature Extraction**: Audio signals are processed through pre-trained self-supervised models to extract perceptual embeddings
|
| 91 |
-
2. **Voice Activity Detection**: Automatic detection of voiced segments using energy-based masking
|
| 92 |
-
3. **Diffusion Maps**: Embeddings are projected using diffusion maps for robust dimensionality reduction
|
| 93 |
-
4. **PS Computation**: Measures Mahalanobis distance between separated outputs and references vs other sources
|
| 94 |
-
5. **PM Computation**: Evaluates against comprehensive distortions including:
|
| 95 |
-
- Noise (white, pink, brown at various SNRs)
|
| 96 |
-
- Filtering (lowpass, highpass, notch, comb)
|
| 97 |
-
- Effects (reverb, echo, tremolo, vibrato)
|
| 98 |
-
- Distortions (clipping, pitch shift, time stretch)
|
| 99 |
-
6. **Scoring**: Frame-level scores are computed and aggregated
|
| 100 |
-
|
| 101 |
-
## Technical Details
|
| 102 |
-
|
| 103 |
-
- **Loudness normalization**: ITU-R BS.1770 standard (-23 LUFS)
|
| 104 |
-
- **Frame-based processing**: 20ms windows with 20ms hop
|
| 105 |
-
- **Correlation-based assignment**: Hungarian algorithm for optimal matching
|
| 106 |
-
- **Memory optimization**: Batch processing with automatic GPU memory management
|
| 107 |
-
- **Robust statistics**: Covariance regularization and outlier handling
|
| 108 |
-
|
| 109 |
## Citation
|
| 110 |
|
| 111 |
If you use MAPSS in your research, please cite:
|
|
@@ -122,10 +83,7 @@ If you use MAPSS in your research, please cite:
|
|
| 122 |
|
| 123 |
## Limitations
|
| 124 |
|
| 125 |
-
- Processing time scales with audio length and model size
|
| 126 |
-
- Memory requirements depend on number of sources and audio length
|
| 127 |
-
- Currently optimized for speech separation (music separation support in development)
|
| 128 |
-
- Maximum recommended sources: 10 per mixture
|
| 129 |
|
| 130 |
## License
|
| 131 |
|
|
@@ -134,4 +92,4 @@ Paper: CC-BY-4.0
|
|
| 134 |
|
| 135 |
## Support
|
| 136 |
|
| 137 |
-
For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/
|
|
|
|
| 10 |
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# MAPSS: Manifold-based Assessment of Perceptual Source Separation
|
| 14 |
|
| 15 |
+
Granular evaluation of speech and music source separation with the MAPSS measures:
|
| 16 |
+
- **Perceptual Matching (PM)**: Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better.
|
| 17 |
+
- **Perceptual Similarity (PS)**: Measures how well an output is separated from its interfering references. Range: 0-1, higher is better.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
|
| 19 |
## Input Format
|
| 20 |
|
|
|
|
| 40 |
## Output Format
|
| 41 |
|
| 42 |
The tool generates a ZIP file containing:
|
| 43 |
+
- `ps_scores_{model}.csv`: PS scores for each speaker/source
|
| 44 |
+
- `pm_scores_{model}.csv`: PM scores for each speaker/source
|
| 45 |
- `params.json`: Experiment parameters used
|
| 46 |
- `manifest_canonical.json`: File mapping and processing details
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## Available Models
|
| 49 |
|
| 50 |
| Model | Description | Default Layer | Use Case |
|
|
|
|
| 67 |
- 0.0 = No normalization
|
| 68 |
- 1.0 = Full normalization (recommended)
|
| 69 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
## Citation
|
| 71 |
|
| 72 |
If you use MAPSS in your research, please cite:
|
|
|
|
| 83 |
|
| 84 |
## Limitations
|
| 85 |
|
| 86 |
+
- Processing time scales with number of sources, audio length and model size
|
|
|
|
|
|
|
|
|
|
| 87 |
|
| 88 |
## License
|
| 89 |
|
|
|
|
| 92 |
|
| 93 |
## Support
|
| 94 |
|
| 95 |
+
For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures).
|