File size: 3,893 Bytes
c30c618 4af1e8c c30c618 27c8c7f c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 4af1e8c c30c618 1a260a9 c30c618 4af1e8c c30c618 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 |
---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
- speech_enhancement
- noise_suppression
- real_time
- fullband
---
# DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN
DPDFNet is a family of causal, single-channel speech enhancement models for real-time noise suppression in challenging everyday environments. It extends the DeepFilterNet2 enhancement framework by inserting Dual-Path RNN (DPRNN) blocks into the encoder, strengthening long-range temporal and cross-band modeling while preserving a compact, streaming-friendly design.
This repository provides TensorFlow Lite (TFLite) models optimized for mobile and edge deployment:
**16kHz models**
* `baseline.tflite`
* `dpdfnet2.tflite`
* `dpdfnet4.tflite`
* `dpdfnet8.tflite`
**48kHz model**
* `dpdfnet2_48khz_hr.tflite`
---
## Key Features
* Causal and low-latency: Designed for streaming use cases such as telephony, conferencing, and embedded devices.
* Dual-Path RNN integration: Improves temporal context and frequency-domain interactions for more robust enhancement in difficult noise conditions.
* Scalable family: Choose baseline or dpdfnet2/4/8 to balance quality vs. compute.
* Edge deployment focus: Demonstrated on Ceva NeuPro Nano NPUs in the accompanying work.
* Fullband option: A dedicated 48kHz model is provided for fullband enhancement.
---
## Model Variants and Footprint
### 16kHz models
| Model | Params [M] | MACs [G] | TFLite Size [MB] |
| --------- | :--------: | :------: | :--------------: |
| Baseline | 2.31 | 0.36 | 8.5 |
| DPDFNet-2 | 2.49 | 1.35 | 10.7 |
| DPDFNet-4 | 2.84 | 2.36 | 12.9 |
| DPDFNet-8 | 3.54 | 4.37 | 17.2 |
### 48kHz model
| Model | Params [M] | MACs [G] | TFLite Size [MB] |
| ------------ | :--------: | :------: | :--------------: |
| DPDFNet-2 HR | 2.58 | 2.42 | 11.6 |
---
## Intended Use
Primary task: Real-time, single-channel speech enhancement (noise suppression).
Deployment targets: Mobile devices, embedded NPUs, and edge platforms.
Input and Output:
* **16kHz models**
* Input: 16kHz mono noisy speech waveform
* Output: 16kHz mono enhanced speech waveform
* **48kHz model**
* Input: 48kHz mono noisy speech waveform
* Output: 48kHz mono enhanced speech waveform
Typical applications:
* Voice calls and VoIP
* Video conferencing
* Always-on voice interfaces
* Wearables, earbuds, and embedded audio devices
---
## Inference
This repo includes a inference script for running the TFLite models on WAV files using streaming-style, frame-by-frame inference: `run_tflite.py`.
> **Note:** When using `dpdfnet2_48khz_hr`, the inference script automatically switches to the 48kHz processing pipeline.
### Setup
Install dependencies:
```bash
pip install numpy soundfile librosa tqdm
pip install tflite-runtime
```
### Model placement
By default, the script loads models from:
* `./<model_name>.tflite`
Create the folder and place the `.tflite` files there (or edit `TFLITE_DIR` in the script to match your layout).
### Run enhancement on a folder of WAVs
The script processes `*.wav` files non-recursively and writes enhanced outputs as 16-bit PCM WAVs:
```bash
python run_tflite.py --noisy_dir /path/to/noisy_wavs --enhanced_dir /path/to/out --model_name dpdfnet8
```
Available `--model_name` options: `baseline`, `dpdfnet2`, `dpdfnet4`, `dpdfnet8`, `dpdfnet2_48khz_hr`.
---
## Training Data
The models were trained using a mixture of public speech and noise datasets, including DNS4 (downsampled), MLS, MUSAN, and FSD50K.
---
## Citation
If you use these models, please cite:
```bibtex
@article{rika2025dpdfnet,
title = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
year = {2025}
}
```
---
## License
Apache-2.0
|