bigcode
/

octogeex

@@ -235,19 +235,14 @@ model-index:
 ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
-# OctoGeeX
-Play with the model on the [TODO Playground](https://huggingface.co/spaces/bigcode/bigcode-playground).
-##  Table of Contents
-1. [Model Summary](##model-summary)
-2. [Use](##use)
-3. [Training](##training)
-5. [License](##license)
-6. [Citation](##citation)
-## Model Summary
 OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
@@ -284,15 +279,15 @@ OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning
 </table>
-## Use
-### Intended use
 The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
 **Feel free to share your generations in the Community tab!**
-### Generation
 ```python
 # pip install -q transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -308,16 +303,16 @@ outputs = model.generate(inputs)
 print(tokenizer.decode(outputs[0]))
 ```
-## Training
-### Model
 - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
 - **Steps:** 250k pretraining & 30 instruction tuning
 - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
 - **Precision:** bfloat16
-### Hardware
 - **Pretraining:**
   - **GPUs:** 512 Tesla A100
@@ -326,17 +321,17 @@ print(tokenizer.decode(outputs[0]))
   - **GPUs:** 8 Tesla A100
   - **Training time:** 4 hours
-### Software
 - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
 - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
-## License
 本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源，模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
 The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
-## Citation
 TODO

 ![Octopack](https://github.com/bigcode-project/octopack/blob/31f3320f098703c7910e43492c39366eeea68d83/banner.png?raw=true)
+# Table of Contents
+1. [Model Summary](#model-summary)
+2. [Use](#use)
+3. [Training](#training)
+4. [Citation](#citation)
+# Model Summary
 OctoGeeX is an instruction tuned model with 6B parameters created by fine-tuning [CodeGeeX2](https://huggingface.co/THUDM/codegeex2-6b) on [CommitPackFT](https://huggingface.co/datasets/bigcode/commitpackft) & [OASST](https://huggingface.co/datasets/bigcode/oasst-octopack) as described in the OctoPack paper.
 </table>
+# Use
+## Intended use
 The model follows instructions provided in the input. We recommend prefacing your input with "Question: " and finishing with "Answer:", for example: "Question: Please write a function in Python that performs bubble sort.\n\nAnswer:"
 **Feel free to share your generations in the Community tab!**
+## Generation
 ```python
 # pip install -q transformers
 from transformers import AutoModelForCausalLM, AutoTokenizer
 print(tokenizer.decode(outputs[0]))
 ```
+# Training
+## Model
 - **Architecture:** GPT-2 model with multi-query attention and Fill-in-the-Middle objective
 - **Steps:** 250k pretraining & 30 instruction tuning
 - **Pretraining tokens:** 1 trillion pretraining & 2M instruction tuning
 - **Precision:** bfloat16
+## Hardware
 - **Pretraining:**
   - **GPUs:** 512 Tesla A100
   - **GPUs:** 8 Tesla A100
   - **Training time:** 4 hours
+## Software
 - **Orchestration:** [Megatron-LM/Transformers](https://github.com/bigcode-project/octopack#training)
 - **Neural networks:** [PyTorch](https://github.com/pytorch/pytorch)
+# License
 本仓库的代码依照 [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) 协议开源，模型的权重的使用则需要遵循 [Model License](MODEL_LICENSE)。
 The code in this repository is open-source under the [MIT license](https://github.com/bigcode-project/octopack/blob/main/LICENSE). The model weights are licensed under the [Model License](MODEL_LICENSE).
+# Citation
 TODO