Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
|
@@ -176,7 +176,7 @@ def create_interface():
|
|
| 176 |
```
|
| 177 |
|
| 178 |
### Audio Requirements
|
| 179 |
-
- Format:
|
| 180 |
- Sample rate: Any (automatically resampled to 16kHz)
|
| 181 |
- Channels: Mono or stereo (converted to mono)
|
| 182 |
- Number of files: Equal number of references and outputs
|
|
@@ -184,9 +184,9 @@ def create_interface():
|
|
| 184 |
## Output Format
|
| 185 |
|
| 186 |
The tool generates a ZIP file containing:
|
| 187 |
-
- `ps_scores_{model}.csv`: PS scores for each
|
| 188 |
-
- `pm_scores_{model}.csv`: PM scores for each
|
| 189 |
-
- `params.json`:
|
| 190 |
- `manifest_canonical.json`: File mapping and processing details
|
| 191 |
|
| 192 |
## Available Models
|
|
@@ -194,14 +194,14 @@ def create_interface():
|
|
| 194 |
| Model | Description | Default Layer | Use Case |
|
| 195 |
|-------|-------------|---------------|----------|
|
| 196 |
| `raw` | Raw waveform features | N/A | Baseline comparison |
|
| 197 |
-
| `wavlm` | WavLM Large | 24 |
|
| 198 |
-
| `wav2vec2` | Wav2Vec2 Large | 24 |
|
| 199 |
-
| `hubert` | HuBERT Large | 24 |
|
| 200 |
-
| `wavlm_base` | WavLM Base | 12 |
|
| 201 |
-
| `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster
|
| 202 |
-
| `hubert_base` | HuBERT Base | 12 |
|
| 203 |
| `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual |
|
| 204 |
-
| `ast` | Audio Spectrogram Transformer | 12 |
|
| 205 |
|
| 206 |
## Parameters
|
| 207 |
|
|
@@ -213,7 +213,7 @@ def create_interface():
|
|
| 213 |
|
| 214 |
## Citation
|
| 215 |
|
| 216 |
-
If you use MAPSS
|
| 217 |
|
| 218 |
```bibtex
|
| 219 |
@article{Ivry2025MAPSS,
|
|
@@ -317,27 +317,8 @@ def create_interface():
|
|
| 317 |
max_lines=10
|
| 318 |
)
|
| 319 |
|
| 320 |
-
gr.Markdown("""
|
| 321 |
-
|
| 322 |
-
The results ZIP will contain:
|
| 323 |
-
- `ps_scores_{model}.csv`: Perceptual Similarity scores for each speaker/source
|
| 324 |
-
- `pm_scores_{model}.csv`: Perceptual Matching scores for each speaker/source
|
| 325 |
-
- `params.json`: Experiment parameters
|
| 326 |
-
- `manifest_canonical.json`: Processed file manifest
|
| 327 |
-
|
| 328 |
-
## Score interpretation:
|
| 329 |
-
- **PS (Perceptual Similarity)**: 0-1 score, higher is better. Measures how well the separated output matches the reference compared to other sources.
|
| 330 |
-
- **PM (Perceptual Matching)**: 0-1 score, higher is better. Measures robustness to audio distortions.
|
| 331 |
-
|
| 332 |
-
## Notes:
|
| 333 |
-
- Processing may take several minutes depending on the audio length and model
|
| 334 |
-
- Audio files are automatically resampled to 16kHz
|
| 335 |
-
- The tool automatically matches outputs to references based on correlation
|
| 336 |
-
- For best results, ensure equal number of reference and output files
|
| 337 |
-
|
| 338 |
-
## Citation:
|
| 339 |
-
If you use this tool in your research, please cite our paper (details coming soon).
|
| 340 |
-
""")
|
| 341 |
|
| 342 |
# Set up the processing
|
| 343 |
process_btn.click(
|
|
|
|
| 176 |
```
|
| 177 |
|
| 178 |
### Audio Requirements
|
| 179 |
+
- Format: .wav files
|
| 180 |
- Sample rate: Any (automatically resampled to 16kHz)
|
| 181 |
- Channels: Mono or stereo (converted to mono)
|
| 182 |
- Number of files: Equal number of references and outputs
|
|
|
|
| 184 |
## Output Format
|
| 185 |
|
| 186 |
The tool generates a ZIP file containing:
|
| 187 |
+
- `ps_scores_{model}.csv`: PS scores for each source
|
| 188 |
+
- `pm_scores_{model}.csv`: PM scores for each source
|
| 189 |
+
- `params.json`: Parameters used
|
| 190 |
- `manifest_canonical.json`: File mapping and processing details
|
| 191 |
|
| 192 |
## Available Models
|
|
|
|
| 194 |
| Model | Description | Default Layer | Use Case |
|
| 195 |
|-------|-------------|---------------|----------|
|
| 196 |
| `raw` | Raw waveform features | N/A | Baseline comparison |
|
| 197 |
+
| `wavlm` | WavLM Large | 24 | Strong performance |
|
| 198 |
+
| `wav2vec2` | Wav2Vec2 Large | 24 | Best overall performance |
|
| 199 |
+
| `hubert` | HuBERT Large | 24 | |
|
| 200 |
+
| `wavlm_base` | WavLM Base | 12 | |
|
| 201 |
+
| `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster, good quality |
|
| 202 |
+
| `hubert_base` | HuBERT Base | 12 | |
|
| 203 |
| `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual |
|
| 204 |
+
| `ast` | Audio Spectrogram Transformer | 12 | Music |
|
| 205 |
|
| 206 |
## Parameters
|
| 207 |
|
|
|
|
| 213 |
|
| 214 |
## Citation
|
| 215 |
|
| 216 |
+
If you use MAPSS, please cite:
|
| 217 |
|
| 218 |
```bibtex
|
| 219 |
@article{Ivry2025MAPSS,
|
|
|
|
| 317 |
max_lines=10
|
| 318 |
)
|
| 319 |
|
| 320 |
+
# gr.Markdown("""
|
| 321 |
+
# """)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 322 |
|
| 323 |
# Set up the processing
|
| 324 |
process_btn.click(
|