Commit
·
2d8805f
1
Parent(s):
9249df4
Update README.md
Browse files
README.md
CHANGED
|
@@ -243,8 +243,7 @@ Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigco
|
|
| 243 |
|
| 244 |
1. [Model Summary](##model-summary)
|
| 245 |
2. [Use](##use)
|
| 246 |
-
3. [
|
| 247 |
-
4. [Training](##training)
|
| 248 |
5. [License](##license)
|
| 249 |
6. [Citation](##citation)
|
| 250 |
|
|
@@ -309,16 +308,16 @@ outputs = model.generate(inputs)
|
|
| 309 |
print(tokenizer.decode(outputs[0]))
|
| 310 |
```
|
| 311 |
|
| 312 |
-
|
| 313 |
|
| 314 |
-
|
| 315 |
|
| 316 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
| 317 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
| 318 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
| 319 |
- **Precision:** bfloat16
|
| 320 |
|
| 321 |
-
|
| 322 |
|
| 323 |
- **Pretraining:**
|
| 324 |
- **GPUs:** 512 Tesla A100
|
|
@@ -327,17 +326,17 @@ print(tokenizer.decode(outputs[0]))
|
|
| 327 |
- **GPUs:** 8 Tesla A100
|
| 328 |
- **Training time:** 4 hours
|
| 329 |
|
| 330 |
-
|
| 331 |
|
| 332 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
| 333 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 334 |
|
| 335 |
-
##
|
| 336 |
|
| 337 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
| 338 |
|
| 339 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
| 340 |
|
| 341 |
-
|
| 342 |
|
| 343 |
TODO
|
|
|
|
| 243 |
|
| 244 |
1. [Model Summary](##model-summary)
|
| 245 |
2. [Use](##use)
|
| 246 |
+
3. [Training](##training)
|
|
|
|
| 247 |
5. [License](##license)
|
| 248 |
6. [Citation](##citation)
|
| 249 |
|
|
|
|
| 308 |
print(tokenizer.decode(outputs[0]))
|
| 309 |
```
|
| 310 |
|
| 311 |
+
## Training
|
| 312 |
|
| 313 |
+
### Model
|
| 314 |
|
| 315 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
| 316 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
| 317 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
| 318 |
- **Precision:** bfloat16
|
| 319 |
|
| 320 |
+
### Hardware
|
| 321 |
|
| 322 |
- **Pretraining:**
|
| 323 |
- **GPUs:** 512 Tesla A100
|
|
|
|
| 326 |
- **GPUs:** 8 Tesla A100
|
| 327 |
- **Training time:** 4 hours
|
| 328 |
|
| 329 |
+
### Software
|
| 330 |
|
| 331 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
| 332 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 333 |
|
| 334 |
+
## License
|
| 335 |
|
| 336 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
| 337 |
|
| 338 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
| 339 |
|
| 340 |
+
## Citation
|
| 341 |
|
| 342 |
TODO
|