Automatic Speech Recognition
Transformers
Safetensors
English
lasr_ctc
medical-asr
radiology
medical

Fine-tuning MedASR for Indian Regional Languages

#4
by Adityab24840 - opened

Hi, I’m working on medical ASR use cases and looking for guidance on fine-tuning MedASR for Indian regional languages such as Hindi, Tamil, Telugu, Kannada, and Bengali. Any recommendations on datasets, multilingual fine-tuning strategies, or evaluation best practices would be really helpful. Happy to collaborate.

Google org

Thanks for reaching out! MedASR is pre-trained and finetuned on English only data. We are not yet sure how it will perform in other languages. At the very least, you need a different tokenizer because the current one is only for English. Unfortunately, we are not very familiar with datasets or evaluation in these languages at the moment. But if you already have something, you should be able to finetune the MedASR model following https://github.com/google-health/medasr/blob/main/notebooks/fine_tune_with_hugging_face.ipynb.

Sign up or log in to comment