Commit
·
0d03e09
1
Parent(s):
2d8805f
Update README.md
Browse files
README.md
CHANGED
|
@@ -235,19 +235,14 @@ model-index:
|
|
| 235 |
|
| 236 |

|
| 237 |
|
| 238 |
-
#
|
| 239 |
|
| 240 |
-
|
|
|
|
|
|
|
|
|
|
| 241 |
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
1. [Model Summary](##model-summary)
|
| 245 |
-
2. [Use](##use)
|
| 246 |
-
3. [Training](##training)
|
| 247 |
-
5. [License](##license)
|
| 248 |
-
6. [Citation](##citation)
|
| 249 |
-
|
| 250 |
-
## Model Summary
|
| 251 |
|
| 252 |
OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
|
| 253 |
|
|
@@ -284,15 +279,15 @@ OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning
|
|
| 284 |
</table>
|
| 285 |
|
| 286 |
|
| 287 |
-
|
| 288 |
|
| 289 |
-
|
| 290 |
|
| 291 |
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
|
| 292 |
|
| 293 |
**Feel free to share your generations in the Community tab!**
|
| 294 |
|
| 295 |
-
|
| 296 |
```python
|
| 297 |
# pip install -q transformers
|
| 298 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
@@ -308,16 +303,16 @@ outputs = model.generate(inputs)
|
|
| 308 |
print(tokenizer.decode(outputs[0]))
|
| 309 |
```
|
| 310 |
|
| 311 |
-
|
| 312 |
|
| 313 |
-
|
| 314 |
|
| 315 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
| 316 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
| 317 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
| 318 |
- **Precision:** bfloat16
|
| 319 |
|
| 320 |
-
|
| 321 |
|
| 322 |
- **Pretraining:**
|
| 323 |
- **GPUs:** 512 Tesla A100
|
|
@@ -326,17 +321,17 @@ print(tokenizer.decode(outputs[0]))
|
|
| 326 |
- **GPUs:** 8 Tesla A100
|
| 327 |
- **Training time:** 4 hours
|
| 328 |
|
| 329 |
-
|
| 330 |
|
| 331 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
| 332 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 333 |
|
| 334 |
-
|
| 335 |
|
| 336 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
| 337 |
|
| 338 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
| 339 |
|
| 340 |
-
|
| 341 |
|
| 342 |
TODO
|
|
|
|
| 235 |
|
| 236 |

|
| 237 |
|
| 238 |
+
# Table of Contents
|
| 239 |
|
| 240 |
+
1. [Model Summary](#model-summary)
|
| 241 |
+
2. [Use](#use)
|
| 242 |
+
3. [Training](#training)
|
| 243 |
+
4. [Citation](#citation)
|
| 244 |
|
| 245 |
+
# Model Summary
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 246 |
|
| 247 |
OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
|
| 248 |
|
|
|
|
| 279 |
</table>
|
| 280 |
|
| 281 |
|
| 282 |
+
# Use
|
| 283 |
|
| 284 |
+
## Intended use
|
| 285 |
|
| 286 |
The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
|
| 287 |
|
| 288 |
**Feel free to share your generations in the Community tab!**
|
| 289 |
|
| 290 |
+
## Generation
|
| 291 |
```python
|
| 292 |
# pip install -q transformers
|
| 293 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
|
|
| 303 |
print(tokenizer.decode(outputs[0]))
|
| 304 |
```
|
| 305 |
|
| 306 |
+
# Training
|
| 307 |
|
| 308 |
+
## Model
|
| 309 |
|
| 310 |
- **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
|
| 311 |
- **Steps:** 250k pretraining & 30 instruction tuning
|
| 312 |
- **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
|
| 313 |
- **Precision:** bfloat16
|
| 314 |
|
| 315 |
+
## Hardware
|
| 316 |
|
| 317 |
- **Pretraining:**
|
| 318 |
- **GPUs:** 512 Tesla A100
|
|
|
|
| 321 |
- **GPUs:** 8 Tesla A100
|
| 322 |
- **Training time:** 4 hours
|
| 323 |
|
| 324 |
+
## Software
|
| 325 |
|
| 326 |
- **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
|
| 327 |
- **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
|
| 328 |
|
| 329 |
+
# License
|
| 330 |
|
| 331 |
本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源,模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
|
| 332 |
|
| 333 |
The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
|
| 334 |
|
| 335 |
+
# Citation
|
| 336 |
|
| 337 |
TODO
|