google
/

switch-c-2048

switch_transformers

text2text-generation

Model card Files Files and versions

ybelkada commited on Nov 16, 2022

Commit

21df17e

·

1 Parent(s): 574e812

Update README.md (#3)

- Update README.md (04866653deace7e94b2f8eea016437446e72d35e)

Files changed (1) hide show

README.md +2 -10

README.md CHANGED Viewed

@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
 ## Direct Use and Downstream Use
-The authors write in [the original paper's model card](https://arxiv.org/pdf/2210.11416.pdf) that:
-> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
-See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
 ## Out-of-Scope Use
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
 ## Training Procedure
-According to the model card from the [original paper](https://arxiv.org/pdf/2210.11416.pdf):
-> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
-The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 # Evaluation

 ## Direct Use and Downstream Use
+See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
 ## Out-of-Scope Use
 ## Training Procedure
+According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
 # Evaluation