File size: 3,893 Bytes
c30c618
 
 
 
 
 
 
4af1e8c
c30c618
27c8c7f
c30c618
 
 
4af1e8c
c30c618
4af1e8c
c30c618
4af1e8c
c30c618
 
 
 
 
4af1e8c
 
 
c30c618
 
 
 
4af1e8c
 
c30c618
4af1e8c
 
c30c618
 
 
 
 
4af1e8c
 
c30c618
4af1e8c
 
 
 
 
 
 
 
 
 
 
c30c618
 
 
 
 
 
 
 
 
 
 
4af1e8c
 
 
 
 
 
c30c618
 
 
 
 
 
 
 
 
 
 
 
4af1e8c
 
 
c30c618
 
 
 
 
 
 
 
 
 
 
 
 
 
4af1e8c
c30c618
 
 
 
 
 
 
 
1a260a9
c30c618
 
4af1e8c
c30c618
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
license: apache-2.0
pipeline_tag: audio-to-audio
tags:
  - speech_enhancement
  - noise_suppression
  - real_time
  - fullband
---


# DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN

DPDFNet is a family of causal, single-channel speech enhancement models for real-time noise suppression in challenging everyday environments. It extends the DeepFilterNet2 enhancement framework by inserting Dual-Path RNN (DPRNN) blocks into the encoder, strengthening long-range temporal and cross-band modeling while preserving a compact, streaming-friendly design.

This repository provides TensorFlow Lite (TFLite) models optimized for mobile and edge deployment:

**16kHz models**
* `baseline.tflite`
* `dpdfnet2.tflite`
* `dpdfnet4.tflite`
* `dpdfnet8.tflite`

**48kHz model**
* `dpdfnet2_48khz_hr.tflite`

---

## Key Features

* Causal and low-latency: Designed for streaming use cases such as telephony, conferencing, and embedded devices.
* Dual-Path RNN integration: Improves temporal context and frequency-domain interactions for more robust enhancement in difficult noise conditions.
* Scalable family: Choose baseline or dpdfnet2/4/8 to balance quality vs. compute.
* Edge deployment focus: Demonstrated on Ceva NeuPro Nano NPUs in the accompanying work.
* Fullband option: A dedicated 48kHz model is provided for fullband enhancement.

---

## Model Variants and Footprint

### 16kHz models

| Model     | Params [M] | MACs [G] | TFLite Size [MB] |
| --------- | :--------: | :------: | :--------------: |
| Baseline  |    2.31    |   0.36   |       8.5        |
| DPDFNet-2 |    2.49    |   1.35   |      10.7        |
| DPDFNet-4 |    2.84    |   2.36   |      12.9        |
| DPDFNet-8 |    3.54    |   4.37   |      17.2        |

### 48kHz model

| Model        | Params [M] | MACs [G] | TFLite Size [MB] |
| ------------ | :--------: | :------: | :--------------: |
| DPDFNet-2 HR |    2.58    |   2.42   |      11.6        |

---

## Intended Use

Primary task: Real-time, single-channel speech enhancement (noise suppression).

Deployment targets: Mobile devices, embedded NPUs, and edge platforms.

Input and Output:

* **16kHz models**
  * Input: 16kHz mono noisy speech waveform
  * Output: 16kHz mono enhanced speech waveform
* **48kHz model**
  * Input: 48kHz mono noisy speech waveform
  * Output: 48kHz mono enhanced speech waveform

Typical applications:

* Voice calls and VoIP
* Video conferencing
* Always-on voice interfaces
* Wearables, earbuds, and embedded audio devices

---

## Inference

This repo includes a inference script for running the TFLite models on WAV files using streaming-style, frame-by-frame inference: `run_tflite.py`.

> **Note:** When using `dpdfnet2_48khz_hr`, the inference script automatically switches to the 48kHz processing pipeline.

### Setup

Install dependencies:

```bash
pip install numpy soundfile librosa tqdm
pip install tflite-runtime
```

### Model placement

By default, the script loads models from:

* `./<model_name>.tflite`

Create the folder and place the `.tflite` files there (or edit `TFLITE_DIR` in the script to match your layout).

### Run enhancement on a folder of WAVs

The script processes `*.wav` files non-recursively and writes enhanced outputs as 16-bit PCM WAVs:

```bash
python run_tflite.py --noisy_dir /path/to/noisy_wavs --enhanced_dir /path/to/out --model_name dpdfnet8
```

Available `--model_name` options: `baseline`, `dpdfnet2`, `dpdfnet4`, `dpdfnet8`, `dpdfnet2_48khz_hr`.

---

## Training Data

The models were trained using a mixture of public speech and noise datasets, including DNS4 (downsampled), MLS, MUSAN, and FSD50K.

---

## Citation

If you use these models, please cite:

```bibtex
@article{rika2025dpdfnet,
  title  = {DPDFNet: Boosting DeepFilterNet2 via Dual-Path RNN},
  author = {Rika, Daniel and Sapir, Nino and Gus, Ido},
  year   = {2025}
}
```

---

## License

Apache-2.0