Spaces:

AIvry
/

MAPSS-measures

Running on Zero

App Files Files Community

AIvry commited on Sep 14

Commit

ee3a404

verified ·

1 Parent(s): aafd275

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -50

README.md CHANGED Viewed

@@ -10,18 +10,11 @@ pinned: false
 license: mit
 ---
-# MAPSS: Multi-source Audio Perceptual Separation Scores
-Evaluate audio source separation quality using Perceptual Similarity (PS) and Perceptual Matching (PM) metrics.
-## Features
-- **Perceptual Similarity (PS)**: Measures how similar separated outputs are to reference sources in perceptual embedding space
-- **Perceptual Matching (PM)**: Evaluates robustness against a comprehensive set of audio distortions
-- **Multiple embedding models**: Support for WavLM, Wav2Vec2, HuBERT, AST, and more
-- **Automatic output-to-reference matching**: Uses correlation-based Hungarian algorithm
-- **GPU-optimized processing**: Efficient batch processing with memory management
-- **Diffusion maps**: Advanced dimensionality reduction for perceptual space analysis
 ## Input Format
@@ -47,22 +40,11 @@ your_mixture.zip
 ## Output Format
 The tool generates a ZIP file containing:
-- `ps_scores_{model}.csv`: PS scores for each speaker/source (0-1, higher is better)
-- `pm_scores_{model}.csv`: PM scores for each speaker/source (0-1, higher is better)
 - `params.json`: Experiment parameters used
 - `manifest_canonical.json`: File mapping and processing details
-### Score Interpretation
-- **PS Score**: Perceptual Similarity
-  - 1.0 = Perfect separation (output identical to reference)
-  - 0.5 = Moderate separation quality
-  - 0.0 = Poor separation (output closer to other sources)
-- **PM Score**: Perceptual Matching (robustness)
-  - 1.0 = Highly robust to distortions
-  - 0.5 = Moderate robustness
-  - 0.0 = Not robust (easily confused with distorted versions)
 ## Available Models
 | Model | Description | Default Layer | Use Case |
@@ -85,27 +67,6 @@ The tool generates a ZIP file containing:
   - 0.0 = No normalization
   - 1.0 = Full normalization (recommended)
-## How It Works
-1. **Feature Extraction**: Audio signals are processed through pre-trained self-supervised models to extract perceptual embeddings
-2. **Voice Activity Detection**: Automatic detection of voiced segments using energy-based masking
-3. **Diffusion Maps**: Embeddings are projected using diffusion maps for robust dimensionality reduction
-4. **PS Computation**: Measures Mahalanobis distance between separated outputs and references vs other sources
-5. **PM Computation**: Evaluates against comprehensive distortions including:
-   - Noise (white, pink, brown at various SNRs)
-   - Filtering (lowpass, highpass, notch, comb)
-   - Effects (reverb, echo, tremolo, vibrato)
-   - Distortions (clipping, pitch shift, time stretch)
-6. **Scoring**: Frame-level scores are computed and aggregated
-## Technical Details
-- **Loudness normalization**: ITU-R BS.1770 standard (-23 LUFS)
-- **Frame-based processing**: 20ms windows with 20ms hop
-- **Correlation-based assignment**: Hungarian algorithm for optimal matching
-- **Memory optimization**: Batch processing with automatic GPU memory management
-- **Robust statistics**: Covariance regularization and outlier handling
 ## Citation
 If you use MAPSS in your research, please cite:
@@ -122,10 +83,7 @@ If you use MAPSS in your research, please cite:
 ## Limitations
-- Processing time scales with audio length and model size
-- Memory requirements depend on number of sources and audio length
-- Currently optimized for speech separation (music separation support in development)
-- Maximum recommended sources: 10 per mixture
 ## License
@@ -134,4 +92,4 @@ Paper: CC-BY-4.0
 ## Support
-For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/yourusername/mapss).

 license: mit
 ---
+# MAPSS: Manifold-based Assessment of Perceptual Source Separation
+Granular evaluation of speech and music source separation with the MAPSS measures:
+- **Perceptual Matching (PM)**: Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better.
+- **Perceptual Similarity (PS)**: Measures how well an output is separated from its interfering references. Range: 0-1, higher is better.
 ## Input Format
 ## Output Format
 The tool generates a ZIP file containing:
+- `ps_scores_{model}.csv`: PS scores for each speaker/source
+- `pm_scores_{model}.csv`: PM scores for each speaker/source
 - `params.json`: Experiment parameters used
 - `manifest_canonical.json`: File mapping and processing details
 ## Available Models
 | Model | Description | Default Layer | Use Case |
   - 0.0 = No normalization
   - 1.0 = Full normalization (recommended)
 ## Citation
 If you use MAPSS in your research, please cite:
 ## Limitations
+- Processing time scales with number of sources, audio length and model size
 ## License
 ## Support
+For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures).