Step-Audio-EditX

Files Needed for Vantage-Step-Audio-EditX ComfyUI node

Original Model Link: https://huggingface.co/stepfun-ai/Step-Audio-EditX

Watch us at Youtube: @VantageWithAI

After downloading the models, copy them into ComfyUI/models, you should have the following structure:

ComfyUI/
├── models/
│   ├── Step-Audio-EditX/
│   ├──── CosyVoice-300M-25Hz/
│   │     ├─── campplus.onnx
│   │     ├─── cosyvoice.yaml
│   │     ├─── flow.pt
│   │     └─── hift.pt
│   ├──── dengcunqin/
│   ├──── └─── speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/
│   │          ├─── am.mvn
│   │          ├─── config.yaml
│   │          ├─── configuration.json
│   │          ├─── model.pt
│   │          ├─── seg_dict
│   │          ├─── tokens.json
│   │          ├─── tokens.txt
│   │          └─── write_tokens_from_txt.py
│   ├── model.safetensors
│   └── speech_tokenizer_v1.onnx

Features

Zero-Shot TTS
- Excellent zero-shot TTS cloning for Mandarin, English, Sichuanese, and Cantonese.
- To use a dialect, just add a [Sichuanese] or [Cantonese] tag before your text.
Emotion and Speaking Style Editing
- Remarkably effective iterative control over emotions and styles, supporting dozens of options for editing.
  - Emotion Editing : [ Angry, Happy, Sad, Excited, Fearful, Surprised, Disgusted, etc. ]
  - Speaking Style Editing: [ Act_coy, Older, Child, Whisper, Serious, Generous, Exaggerated, etc.]
  - Editing with more emotion and more speaking styles is on the way. Get Ready! 🚀
Paralinguistic Editing:
- Precise control over 10 types of paralinguistic features for more natural, human-like, and expressive synthetic audio.
- Supporting Tags:
  - [ Breathing, Laughter, Suprise-oh, Confirmation-en, Uhm, Suprise-ah, Suprise-wa, Sigh, Question-ei, Dissatisfaction-hnn ]

For more examples, see demo page.

Citation

@misc{yan2025stepaudioeditxtechnicalreport,
      title={Step-Audio-EditX Technical Report}, 
      author={Chao Yan and Boyong Wu and Peng Yang and Pengfei Tan and Guoqiang Hu and Yuxin Zhang and Xiangyu and Zhang and Fei Tian and Xuerui Yang and Xiangyu Zhang and Daxin Jiang and Gang Yu},
      year={2025},
      eprint={2511.03601},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2511.03601}, 
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Paper for vantagewithai/Step-Fun-EditX-ComfyUI

Step-Audio-EditX Technical Report

Paper • 2511.03601 • Published Nov 5, 2025 • 28