Update README.md (#3)
Browse files- Update README.md (04866653deace7e94b2f8eea016437446e72d35e)
README.md
CHANGED
|
@@ -158,11 +158,7 @@ print(tokenizer.decode(outputs[0]))
|
|
| 158 |
|
| 159 |
## Direct Use and Downstream Use
|
| 160 |
|
| 161 |
-
|
| 162 |
-
|
| 163 |
-
> The primary use is research on language models, including: research on zero-shot NLP tasks and in-context few-shot learning NLP tasks, such as reasoning, and question answering; advancing fairness and safety research, and understanding limitations of current large language models
|
| 164 |
-
|
| 165 |
-
See the [research paper](https://arxiv.org/pdf/2210.11416.pdf) for further details.
|
| 166 |
|
| 167 |
## Out-of-Scope Use
|
| 168 |
|
|
@@ -193,11 +189,7 @@ The model was trained on a Masked Language Modeling task, on Colossal Clean Craw
|
|
| 193 |
|
| 194 |
## Training Procedure
|
| 195 |
|
| 196 |
-
According to the model card from the [original paper](https://arxiv.org/pdf/
|
| 197 |
-
|
| 198 |
-
> These models are based on pretrained SwitchTransformers and are not fine-tuned. It is normal if they perform well on zero-shot tasks.
|
| 199 |
-
|
| 200 |
-
The model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
| 201 |
|
| 202 |
|
| 203 |
# Evaluation
|
|
|
|
| 158 |
|
| 159 |
## Direct Use and Downstream Use
|
| 160 |
|
| 161 |
+
See the [research paper](https://arxiv.org/pdf/2101.03961.pdf) for further details.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 162 |
|
| 163 |
## Out-of-Scope Use
|
| 164 |
|
|
|
|
| 189 |
|
| 190 |
## Training Procedure
|
| 191 |
|
| 192 |
+
According to the model card from the [original paper](https://arxiv.org/pdf/2101.03961.pdf) the model has been trained on TPU v3 or TPU v4 pods, using [`t5x`](https://github.com/google-research/t5x) codebase together with [`jax`](https://github.com/google/jax).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 193 |
|
| 194 |
|
| 195 |
# Evaluation
|