Update README.md
Browse files
README.md
CHANGED
|
@@ -118,7 +118,6 @@ We used an AWS g5.24xlarge instance to train the NN.
|
|
| 118 |
|
| 119 |
#### Preprocessing [optional]
|
| 120 |
|
| 121 |
-
[More Information Needed]
|
| 122 |
We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
|
| 123 |
The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
|
| 124 |
|
|
|
|
| 118 |
|
| 119 |
#### Preprocessing [optional]
|
| 120 |
|
|
|
|
| 121 |
We first train a tokenizer on the Librispeech dataset. The tokenizer converts labels into tokens. For example, in English, it is very common to have 's at the end of words, the tokenizer will identify that patten and have a dedicated token for the 's combination.
|
| 122 |
The code to obtain the tokenizer is available in https://github.com/Arm-Examples/ML-examples/blob/main/pytorch-conformer-train-quantize/training/build_sp_128_librispeech.py . The trained tokenizer is also available in the Hugging Face repository.
|
| 123 |
|