SpeakerLM: End-to-End Versatile Speaker Diarization and Recognition with Multimodal Large Language Models Paper • 2508.06372 • Published Aug 8 • 2
An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification Paper • 2305.12838 • Published May 22, 2023
CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking Paper • 2303.00332 • Published Mar 1, 2023
3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization Paper • 2403.19971 • Published Mar 29, 2024
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization Paper • 2408.12102 • Published Aug 22, 2024
Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization Paper • 2305.12927 • Published May 22, 2023
3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement Paper • 2306.15354 • Published Jun 27, 2023 • 7
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation Paper • 2410.17799 • Published Oct 23, 2024 • 7