Spaces:

AIvry
/

MAPSS-measures

Sleeping

App Files Files Community

AIvry commited on Sep 14

Commit

f76a9ce

verified ·

1 Parent(s): 480d8a7

Delete hf_readme.md

Browse files

Files changed (1) hide show

hf_readme.md +0 -136

hf_readme.md DELETED Viewed

@@ -1,136 +0,0 @@
----
-title: MAPSS Multi Source Audio Perceptual Separation Scores
-emoji: 🎵
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: 4.0.0
-app_file: app.py
-pinned: false
-license: mit
----
-# MAPSS: Multi-source Audio Perceptual Separation Scores
-Evaluate audio source separation quality using Perceptual Similarity (PS) and Perceptual Matching (PM) metrics.
-## Features
-- **Perceptual Similarity (PS)**: Measures how similar separated outputs are to reference sources in perceptual embedding space
-- **Perceptual Matching (PM)**: Evaluates robustness against a comprehensive set of audio distortions
-- **Multiple embedding models**: Support for WavLM, Wav2Vec2, HuBERT, AST, and more
-- **Automatic output-to-reference matching**: Uses correlation-based Hungarian algorithm
-- **GPU-optimized processing**: Efficient batch processing with memory management
-- **Diffusion maps**: Advanced dimensionality reduction for perceptual space analysis
-## Input Format
-Upload a ZIP file containing:
-```
-your_mixture.zip
-├── references/       # Original clean sources
-│   ├── speaker1.wav
-│   ├── speaker2.wav
-│   └── ...
-└── outputs/         # Separated outputs from your algorithm
-    ├── separated1.wav
-    ├── separated2.wav
-    └── ...
-```
-### Audio Requirements
-- Format: WAV files
-- Sample rate: Any (automatically resampled to 16kHz)
-- Channels: Mono or stereo (converted to mono)
-- Number of files: Equal number of references and outputs
-## Output Format
-The tool generates a ZIP file containing:
-- `ps_scores_{model}.csv`: PS scores for each speaker/source (0-1, higher is better)
-- `pm_scores_{model}.csv`: PM scores for each speaker/source (0-1, higher is better)
-- `params.json`: Experiment parameters used
-- `manifest_canonical.json`: File mapping and processing details
-### Score Interpretation
-- **PS Score**: Perceptual Similarity
-  - 1.0 = Perfect separation (output identical to reference)
-  - 0.5 = Moderate separation quality
-  - 0.0 = Poor separation (output closer to other sources)
-- **PM Score**: Perceptual Matching (robustness)
-  - 1.0 = Highly robust to distortions
-  - 0.5 = Moderate robustness
-  - 0.0 = Not robust (easily confused with distorted versions)
-## Available Models
-| Model | Description | Default Layer | Use Case |
-|-------|-------------|---------------|----------|
-| `raw` | Raw waveform features | N/A | Baseline comparison |
-| `wavlm` | WavLM Large | 24 | Best overall performance |
-| `wav2vec2` | Wav2Vec2 Large | 24 | Strong performance |
-| `hubert` | HuBERT Large | 24 | Good for speech |
-| `wavlm_base` | WavLM Base | 12 | Faster, good quality |
-| `wav2vec2_base` | Wav2Vec2 Base | 12 | Faster processing |
-| `hubert_base` | HuBERT Base | 12 | Faster for speech |
-| `wav2vec2_xlsr` | Wav2Vec2 XLSR-53 | 24 | Multilingual |
-| `ast` | Audio Spectrogram Transformer | 12 | General audio |
-## Parameters
-- **Model**: Select the embedding model for feature extraction
-- **Layer**: Which transformer layer to use (auto-selected by default)
-- **Alpha**: Diffusion maps parameter (0.0-1.0, default: 1.0)
-  - 0.0 = No normalization
-  - 1.0 = Full normalization (recommended)
-## How It Works
-1. **Feature Extraction**: Audio signals are processed through pre-trained self-supervised models to extract perceptual embeddings
-2. **Voice Activity Detection**: Automatic detection of voiced segments using energy-based masking
-3. **Diffusion Maps**: Embeddings are projected using diffusion maps for robust dimensionality reduction
-4. **PS Computation**: Measures Mahalanobis distance between separated outputs and references vs other sources
-5. **PM Computation**: Evaluates against comprehensive distortions including:
-   - Noise (white, pink, brown at various SNRs)
-   - Filtering (lowpass, highpass, notch, comb)
-   - Effects (reverb, echo, tremolo, vibrato)
-   - Distortions (clipping, pitch shift, time stretch)
-6. **Scoring**: Frame-level scores are computed and aggregated
-## Technical Details
-- **Loudness normalization**: ITU-R BS.1770 standard (-23 LUFS)
-- **Frame-based processing**: 20ms windows with 20ms hop
-- **Correlation-based assignment**: Hungarian algorithm for optimal matching
-- **Memory optimization**: Batch processing with automatic GPU memory management
-- **Robust statistics**: Covariance regularization and outlier handling
-## Citation
-If you use MAPSS in your research, please cite:
-```bibtex
-@article{mapss2024,
-  title={MAPSS: Multi-source Audio Perceptual Separation Scores},
-  author={Your Name},
-  journal={arXiv preprint},
-  year={2024}
-}
-```
-## Limitations
-- Processing time scales with audio length and model size
-- Memory requirements depend on number of sources and audio length
-- Currently optimized for speech separation (music separation support in development)
-- Maximum recommended sources: 10 per mixture
-## License
-Code: MIT License
-Paper: CC-BY-4.0
-## Support
-For issues, questions, or contributions, please visit the [GitHub repository](https://github.com/yourusername/mapss).