AIvry commited on
Commit
ee3a404
·
verified ·
1 Parent(s): aafd275

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -50
README.md CHANGED
@@ -10,18 +10,11 @@ pinned: false
10
  license: mit
11
  ---
12
 
13
- # MAPSS: Multi-source Audio Perceptual Separation Scores
14
 
15
- Evaluate audio source separation quality using Perceptual Similarity (PS) and Perceptual Matching (PM) metrics.
16
-
17
- ## Features
18
-
19
- - **Perceptual Similarity (PS)**: Measures how similar separated outputs are to reference sources in perceptual embedding space
20
- - **Perceptual Matching (PM)**: Evaluates robustness against a comprehensive set of audio distortions
21
- - **Multiple embedding models**: Support for WavLM, Wav2Vec2, HuBERT, AST, and more
22
- - **Automatic output-to-reference matching**: Uses correlation-based Hungarian algorithm
23
- - **GPU-optimized processing**: Efficient batch processing with memory management
24
- - **Diffusion maps**: Advanced dimensionality reduction for perceptual space analysis
25
 
26
  ## Input Format
27
 
@@ -47,22 +40,11 @@ your_mixture.zip
47
  ## Output Format
48
 
49
  The tool generates a ZIP file containing:
50
- - `ps_scores_{model}.csv`: PS scores for each speaker/source (0-1, higher is better)
51
- - `pm_scores_{model}.csv`: PM scores for each speaker/source (0-1, higher is better)
52
  - `params.json`: Experiment parameters used
53
  - `manifest_canonical.json`: File mapping and processing details
54
 
55
- ### Score Interpretation
56
- - **PS Score**: Perceptual Similarity
57
- - 1.0 = Perfect separation (output identical to reference)
58
- - 0.5 = Moderate separation quality
59
- - 0.0 = Poor separation (output closer to other sources)
60
-
61
- - **PM Score**: Perceptual Matching (robustness)
62
- - 1.0 = Highly robust to distortions
63
- - 0.5 = Moderate robustness
64
- - 0.0 = Not robust (easily confused with distorted versions)
65
-
66
  ## Available Models
67
 
68
  | Model | Description | Default Layer | Use Case |
@@ -85,27 +67,6 @@ The tool generates a ZIP file containing:
85
  - 0.0 = No normalization
86
  - 1.0 = Full normalization (recommended)
87
 
88
- ## How It Works
89
-
90
- 1. **Feature Extraction**: Audio signals are processed through pre-trained self-supervised models to extract perceptual embeddings
91
- 2. **Voice Activity Detection**: Automatic detection of voiced segments using energy-based masking
92
- 3. **Diffusion Maps**: Embeddings are projected using diffusion maps for robust dimensionality reduction
93
- 4. **PS Computation**: Measures Mahalanobis distance between separated outputs and references vs other sources
94
- 5. **PM Computation**: Evaluates against comprehensive distortions including:
95
- - Noise (white, pink, brown at various SNRs)
96
- - Filtering (lowpass, highpass, notch, comb)
97
- - Effects (reverb, echo, tremolo, vibrato)
98
- - Distortions (clipping, pitch shift, time stretch)
99
- 6. **Scoring**: Frame-level scores are computed and aggregated
100
-
101
- ## Technical Details
102
-
103
- - **Loudness normalization**: ITU-R BS.1770 standard (-23 LUFS)
104
- - **Frame-based processing**: 20ms windows with 20ms hop
105
- - **Correlation-based assignment**: Hungarian algorithm for optimal matching
106
- - **Memory optimization**: Batch processing with automatic GPU memory management
107
- - **Robust statistics**: Covariance regularization and outlier handling
108
-
109
  ## Citation
110
 
111
  If you use MAPSS in your research, please cite:
@@ -122,10 +83,7 @@ If you use MAPSS in your research, please cite:
122
 
123
  ## Limitations
124
 
125
- - Processing time scales with audio length and model size
126
- - Memory requirements depend on number of sources and audio length
127
- - Currently optimized for speech separation (music separation support in development)
128
- - Maximum recommended sources: 10 per mixture
129
 
130
  ## License
131
 
@@ -134,4 +92,4 @@ Paper: CC-BY-4.0
134
 
135
  ## Support
136
 
137
- For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/yourusername/mapss).
 
10
  license: mit
11
  ---
12
 
13
+ # MAPSS: Manifold-based Assessment of Perceptual Source Separation
14
 
15
+ Granular evaluation of speech and music source separation with the MAPSS measures:
16
+ - **Perceptual Matching (PM)**: Measures how closely an output perceptually aligns with its reference. Range: 0-1, higher is better.
17
+ - **Perceptual Similarity (PS)**: Measures how well an output is separated from its interfering references. Range: 0-1, higher is better.
 
 
 
 
 
 
 
18
 
19
  ## Input Format
20
 
 
40
  ## Output Format
41
 
42
  The tool generates a ZIP file containing:
43
+ - `ps_scores_{model}.csv`: PS scores for each speaker/source
44
+ - `pm_scores_{model}.csv`: PM scores for each speaker/source
45
  - `params.json`: Experiment parameters used
46
  - `manifest_canonical.json`: File mapping and processing details
47
 
 
 
 
 
 
 
 
 
 
 
 
48
  ## Available Models
49
 
50
  | Model | Description | Default Layer | Use Case |
 
67
  - 0.0 = No normalization
68
  - 1.0 = Full normalization (recommended)
69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
70
  ## Citation
71
 
72
  If you use MAPSS in your research, please cite:
 
83
 
84
  ## Limitations
85
 
86
+ - Processing time scales with number of sources, audio length and model size
 
 
 
87
 
88
  ## License
89
 
 
92
 
93
  ## Support
94
 
95
+ For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/amir-ivry/MAPSS-measures).