Step-Audio-EditX Technical Report
Paper
β’
2511.03601
β’
Published
β’
28
Files Needed for Vantage-Step-Audio-EditX ComfyUI node
Original Model Link: https://huggingface.co/stepfun-ai/Step-Audio-EditX
Watch us at Youtube: @VantageWithAI
π ComfyUI Node
After downloading the models, copy them into ComfyUI/models, you should have the following structure:
ComfyUI/
βββ models/
β βββ Step-Audio-EditX/
β βββββ CosyVoice-300M-25Hz/
β β ββββ campplus.onnx
β β ββββ cosyvoice.yaml
β β ββββ flow.pt
β β ββββ hift.pt
β βββββ dengcunqin/
β βββββ ββββ speech_paraformer-large_asr_nat-zh-cantonese-en-16k-vocab8501-online/
β β ββββ am.mvn
β β ββββ config.yaml
β β ββββ configuration.json
β β ββββ model.pt
β β ββββ seg_dict
β β ββββ tokens.json
β β ββββ tokens.txt
β β ββββ write_tokens_from_txt.py
β βββ model.safetensors
β βββ speech_tokenizer_v1.onnx
Zero-Shot TTS
Emotion and Speaking Style Editing
Paralinguistic Editing:
For more examples, see demo page.
@misc{yan2025stepaudioeditxtechnicalreport,
title={Step-Audio-EditX Technical Report},
author={Chao Yan and Boyong Wu and Peng Yang and Pengfei Tan and Guoqiang Hu and Yuxin Zhang and Xiangyu and Zhang and Fei Tian and Xuerui Yang and Xiangyu Zhang and Daxin Jiang and Gang Yu},
year={2025},
eprint={2511.03601},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2511.03601},
}