readme: include number of training epochs
Browse files
    	
        README.md
    CHANGED
    
    | 
         @@ -22,12 +22,13 @@ Preliminary Historic Multilingual and Monolingual ByT5 Models. Following languag 
     | 
|
| 22 | 
         | 
| 23 | 
         
             
            More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
         
     | 
| 24 | 
         | 
| 25 | 
         
            -
             
     | 
| 26 | 
         
             
            # Pretraining
         
     | 
| 27 | 
         | 
| 28 | 
         
             
            We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
         
     | 
| 29 | 
         
             
            Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
         
     | 
| 30 | 
         | 
| 
         | 
|
| 
         | 
|
| 31 | 
         
             
            # Evaluation on Downstream Tasks (NER)
         
     | 
| 32 | 
         | 
| 33 | 
         
             
            We evaluated the hmByT5 model on downstream tasks:
         
     | 
| 
         | 
|
| 22 | 
         | 
| 23 | 
         
             
            More details can be found in [our GitHub repository](https://github.com/stefan-it/hmByT5).
         
     | 
| 24 | 
         | 
| 
         | 
|
| 25 | 
         
             
            # Pretraining
         
     | 
| 26 | 
         | 
| 27 | 
         
             
            We use the official JAX/FLAX example in Hugging Face Transformers to pretrain a ByT5 model on a single v3-8 TPU.
         
     | 
| 28 | 
         
             
            Details about the training can be found [here](https://github.com/stefan-it/hmByT5/tree/main/hmbyt5-flax).
         
     | 
| 29 | 
         | 
| 30 | 
         
            +
            The model was trained for 0.5 epoch.
         
     | 
| 31 | 
         
            +
             
     | 
| 32 | 
         
             
            # Evaluation on Downstream Tasks (NER)
         
     | 
| 33 | 
         | 
| 34 | 
         
             
            We evaluated the hmByT5 model on downstream tasks:
         
     |