Arm
/

stt_en_conformer_executorch_small

@@ -17,24 +17,23 @@ tags:
 - audioprocessing
 - transformer
 ---
-# Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
-Conformer is a popular Transformer based speech recognition network, suitable for embedded devices. This repository contains FP32 trained weights and the associated tokenizer for an implementation of Conformer. We also include exported quantized program with ExecuTorch, quantized for the ExecuTorch Ethos-U backend allowing an easy deployment on SoCs with an Arm® Ethos™-U NPU.
 ## Model Details
 ### Model Description
 Conformer is a popular Neural Network for speech recognition. This repository contains trained weights for the Conformer implementation in https://github.com/sooftware/conformer/
 - **Developed by:** Arm
 - **Model type:** Transformer
 - **Language(s) (NLP):** English
 - **License:** BigScience OpenRAIL-M v1.1
-### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
@@ -116,7 +115,7 @@ We used the LibriSpeech 960h dataset. The dataset is composed of 460h of clean a
 If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
 We used an AWS g5.24xlarge instance to train the NN.
-#### Preprocessing [optional]
 We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
 The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
@@ -136,12 +135,6 @@ recommended by the paper and 512 FFTs.
 - **Warmup-epochs:** 2.0
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-We test the model on the LibriSpeech `test-clean` dataset and obtain 7% Word Error Rate.

 - audioprocessing
 - transformer
 ---
+# Arm ExecuTorch Conformer
 <!-- Provide a quick summary of what the model is/does. -->
+Conformer is a popular Transformer based speech recognition network, suitable for embedded devices. This repository contains an example of FP32 trained weights and the associated tokenizer for an implementation of Conformer. We also include exported quantized program with ExecuTorch, quantized for the ExecuTorch Ethos-U backend allowing an easy deployment on SoCs with an Arm® Ethos™-U NPU.
 ## Model Details
 ### Model Description
 Conformer is a popular Neural Network for speech recognition. This repository contains trained weights for the Conformer implementation in https://github.com/sooftware/conformer/
 - **Developed by:** Arm
 - **Model type:** Transformer
 - **Language(s) (NLP):** English
 - **License:** BigScience OpenRAIL-M v1.1
+### Model Sources
 <!-- Provide the basic links for the model. -->
 If you want to train the Conformer model from scratch, you can do so by following the instructions in https://github.com/Arm-Examples/ML-examples/tree/main/pytorch-conformer-train-quantize/training
 We used an AWS g5.24xlarge instance to train the NN.
+#### Preprocessing
 We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
 The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
 - **Warmup-epochs:** 2.0
+### Testing Data
+We test the model on the LibriSpeech `test-clean` dataset and obtain 7% Word Error Rate. The accuracy of the model may be improved through training with additional datasets, and through data augmentation techniques such as time slicing.