Spaces:
Build error
Build error
| ARTICLE =""" | |
| **Motivation** | |
| In Africa, like in other continents of the world, people access vital information mainly through their mobile phones. Therefore the need for voice-enabled applications can be found in all sectors, from health, food to more fun (games, social media). | |
| Existing speech recognition services are not available in many African languages, and the speakers of these languages are excluded from the benefits of voice-enabled technologies. | |
| This dataset will boost speech technologies (like speech-to-text, text-to-speech, speech translation, and modeling) for African languages, which hitherto had little or no public dataset. | |
| **Note:** This is a continuous effort. This sprint is just to kick-start the event. Please feel free to share with your family and friends and keep recording more. | |
| **Benefits of such a dataset** | |
| - Useful dataset to learn audio-related Machine Learning (automatics speech recognition, text-to-speech, other types of speech processing). | |
| - It can be used as a simple training and/or evaluation dataset for speech processing tasks. | |
| - Very easy dataset to train your model on and get good results. With this dataset, you can easily train a model to recognize numbers in your language. | |
| - Opens up opportunities for more sophisticated speech processing models for African languages. | |
| **What about License and security?** | |
| - The safety and interest of the recorders come first. Based on that, we are exploring options like a gated dataset ([this is an example of a gated dataset](https://huggingface.co/datasets/mozilla-foundation/common_voice_9_0)) to ensure anonymity and safety, as well as better license for the dataset. | |
| - If you have ideas of better privacy enhancement processes, or more licensing that is more beneficial to the contributors, please reach out to me. My contact details are below. | |
| **About the dataset** | |
| - The data (metadata, text, and audio recording) are uploaded to [a public Hugging Face dataset](https://huggingface.co/datasets/chrisjay/crowd-speech-africa). For code lovers, [this](https://huggingface.co/spaces/chrisjay/afro-speech/blob/main/app.py#L90-L106) is the part of our code that handles the upload. | |
| - We do not collect your name, address or other sensitive information. | |
| - If for some reason you want to remove your entry, please reach out by email. | |
| - Your email, if given, is used only to keep track of your progress in order to give the prizes to the top scorers. They are temporarily stored in [this private dataset](https://huggingface.co/datasets/chrisjay/african-digits-recording-sprint-email) and immediately deleted after the sprint. | |
| **Contact** | |
| In case of questions, issues or anything contact Chris Emezue at: | |
| - Email: chris.emezue@gmail.com | |
| - [Telegram](https://t.me/realchrisjay) | |
| """ |