codefuse-ai
/

CodeFuse-Mixtral-8x7B

@@ -1,12 +1,10 @@
 ---
-license: apache-2.0
 tasks:
 - code-generation
 ---
-# Model Card for CodeFuse-CodeLlama-34B
-<p align="center">
-    <img src="https://modelscope.cn/api/v1/models/codefuse-ai/CodeFuse-QWen-14B/repo?Revision=master&FilePath=LOGO.jpg&View=true" width="800"/>
-<p>
 [[中文]](#chinese)    [[English]](#english)
@@ -16,13 +14,27 @@ tasks:
 ## Model Description
-CodeFuse-CodeLlama-34B is a 34B Code-LLM finetuned by QLoRA of multiple code tasks（600k instrunctions/answers） on the base model CodeLlama-34b-Python.
-The context length of finetuning is 4K while it is able to be finetuned by 16k context if necessary.
 <br>
 ## News and Updates
-🔥🔥🔥 CodeFuse-CodeLlama34B-MFT has achived 74.4% of pass@1 on HumanEval, which is SOTA at present.
 <br>
@@ -36,12 +48,21 @@ The context length of finetuning is 4K while it is able to be finetuned by 16k c
 + If you wish to see a demo of the model, you can visit ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨
 ## Performance
 | Model                       | HumanEval(pass@1) |  Date   |
 |:----------------------------|:-----------------:|:-------:|
-| **CodeFuse-CodeLlama-34B**  |     **74.4%**      | 2023.9  |
 | WizardCoder-Python-34B-V1.0 |       73.2%       | 2023.8  |
 | GPT-4(zero-shot)            |       67.0%       | 2023.3  |
 | PanGu-Coder2 15B            |       61.6%       | 2023.8  |
@@ -50,7 +71,14 @@ The context length of finetuning is 4K while it is able to be finetuned by 16k c
 | GPT-3.5(zero-shot)          |       48.1%       | 2022.11 |
 | OctoCoder                   |       46.2%       | 2023.8  |
 | StarCoder-15B               |       33.6%       | 2023.5  |
-| LLaMA 2 70B(zero-shot)      |       29.9%       | 2023.7  |
 <br>
@@ -58,7 +86,7 @@ The context length of finetuning is 4K while it is able to be finetuned by 16k c
 * python>=3.8
 * pytorch>=2.0.0
-* transformers==4.32.0
 * Sentencepiece
 * CUDA 11.4
   <br>
@@ -66,93 +94,143 @@ The context length of finetuning is 4K while it is able to be finetuned by 16k c
 ##  Inference String Format
 The inference string is a concatenated string formed by combining conversation data(system, human and bot contents) in the training data format.  It is used as input during the inference process.
-Here is an example format of the concatenated string:
 ```python
 """
-<|role_start|>system<|role_end|>System instruction
-<|role_start|>human<|role_end|>Human 1st round input
-<|role_start|>bot<|role_end|>Bot 1st round output</s>
-<|role_start|>human<|role_end|>Human 2nd round input
-<|role_start|>bot<|role_end|>Bot 2nd round output</s>
 ...
 ...
 ...
-<|role_start|>human<|role_end|>Human nth round input
-<|role_start|>bot<|role_end|>{Bot output to be genreated}</s>
 """
 ```
-When applying inference, you always make your input string end with "<|role_start|>bot<|role_end|>" to ask the model generating answers.
-## Quickstart
-```bash
-pip install -r requirements.txt
 ```
-```python
-import torch
-from modelscope import AutoTokenizer, AutoModelForCausalLM, snapshot_download
-model_dir = snapshot_download('codefuse-ai/CodeFuse-CodeLlama-34B', revision='v1.0.0')
-tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False, legacy=False)
-tokenizer.padding_side = "left"
-tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids("<unk>")
-tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids("</s>")
-model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True,
-                                             device_map='auto',
-                                             torch_dtype=torch.bfloat16)
-HUMAN_ROLE_START_TAG = "<|role_start|>human<|role_end|>"
-BOT_ROLE_START_TAG = "<|role_start|>bot<|role_end|>"
-text = f"{HUMAN_ROLE_START_TAG}write a python function of quick sort.{BOT_ROLE_START_TAG}"
-inputs = tokenizer(text, return_tensors='pt', padding=True, add_special_tokens=False).to("cuda")
-outputs = model.generate(
-        inputs=inputs["input_ids"],
-        attention_mask=inputs["attention_mask"],
         max_new_tokens=512,
         top_p=0.95,
-        temperature=0.1,
-        do_sample=True,
-        eos_token_id=tokenizer.eos_token_id,
-        pad_token_id=tokenizer.pad_token_id
-    )
-gen_text = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
-print(gen_text)
 ```
-## MD5
-We notice that the file may be corrupted during transfer process. Please check MD5 value before use.
-| Model File                       | MD5 Value                        |
-|:---------------------------------|:--------------------------------:|
-| pytorch_model-00001-of-00007.bin | 8d544b1bcb3449934184d4141137329c |
-| pytorch_model-00002-of-00007.bin | 9d5dbb30911e48a42fb6d0fcabb322a4 |
-| pytorch_model-00003-of-00007.bin | b0d4aecee0457d9332005a187e1fffed |
-| pytorch_model-00004-of-00007.bin | 5c7e002de5eab77d0194a2b0f6de0c24 |
-| pytorch_model-00005-of-00007.bin | d22a511aa26b5b17117b665a877490ab |
-| pytorch_model-00006-of-00007.bin | a5c28ac277fac07d16dd66537e54d109 |
-| pytorch_model-00007-of-00007.bin | a967e2c6195477b7407089c0bffa2d53 |
 <a id="chinese"></a>
 ## 模型简介
-CodeFuse-CodeLlama34B-MFT 是一个通过QLoRA对基座模型CodeLlama-34b-Python进行多代码任务微调的代码大模型。模型微调采用了4k上下文。如果有必要，可以扩展到16k。
 <br>
 ## 新闻
-🔥🔥🔥 CodeFuse-CodeLlama34B-MFT模型在HumanEval pass@1上可以达到74.4%, 为当前开源SOTA。
 <br>
 ## 代码社区
-**大本营**： 🏡 https://github.com/codefuse-ai （**欢迎为我们的项目一键三连 Star🌟 + Fork🚀 + Watch👀**）
 + 如果您想自己微调该模型，可以访问 ✨[MFTCoder](https://github.com/codefuse-ai/MFTCoder)✨✨
@@ -160,11 +238,18 @@ CodeFuse-CodeLlama34B-MFT 是一个通过QLoRA对基座模型CodeLlama-34b-Pytho
 + 如果您想观看该模型示例，可以访问 ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨
-## 评测表现(代码)
 | 模型                          | HumanEval(pass@1) |   日期    |
 |:----------------------------|:-----------------:|:-------:|
-| **CodeFuse-CodeLlama-34B**  |     **74.4%**      | 2023.9  |
 | WizardCoder-Python-34B-V1.0 |       73.2%       | 2023.8  |
 | GPT-4(zero-shot)            |       67.0%       | 2023.3  |
 | PanGu-Coder2 15B            |       61.6%       | 2023.8  |
@@ -173,83 +258,125 @@ CodeFuse-CodeLlama34B-MFT 是一个通过QLoRA对基座模型CodeLlama-34b-Pytho
 | GPT-3.5(zero-shot)          |       48.1%       | 2022.11 |
 | OctoCoder                   |       46.2%       | 2023.8  |
 | StarCoder-15B               |       33.6%       | 2023.5  |
-| LLaMA 2 70B(zero-shot)      |       29.9%       | 2023.7  |
-<br>
 ## Requirements
 * python>=3.8
 * pytorch>=2.0.0
-* transformers==4.32.0
 * CUDA 11.4
 <br>
 ## 推理数据格式
-推理数据为模型在训练数据格式下拼接的字符串形式，它也是推理时输入prompt拼接的方式：
 ```python
 """
-<|role_start|>system<|role_end|>这是System指令
-<|role_start|>human<|role_end|>这是第1轮用户输入的问题
-<|role_start|>bot<|role_end|>这是第1轮模型生成的内容</s>
-<|role_start|>human<|role_end|>这是第2轮用户输入的问题
-<|role_start|>bot<|role_end|>这是第2轮模型生成的内容</s>
 ...
 ...
 ...
-<|role_start|>human<|role_end|>这是第n轮用户输入的问题
-<|role_start|>bot<|role_end|>{模型现在要生成的内容}</s>
 """
 ```
-推理时，请确保拼接的prompt字符串以"<|role_start|>bot<|role_end|>"结尾，引导模型生成回答。
-## 快速使用
 ```python
-import torch
-from modelscope import AutoTokenizer, AutoModelForCausalLM, snapshot_download
-model_dir = snapshot_download('codefuse-ai/CodeFuse-CodeLlama-34B', revision='v1.0.0')
-tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True, use_fast=False, legacy=False)
-tokenizer.padding_side = "left"
-tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids("<unk>")
-tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids("</s>")
-model = AutoModelForCausalLM.from_pretrained(model_dir, trust_remote_code=True,
-                                             device_map='auto',
-                                             torch_dtype=torch.bfloat16)
-HUMAN_ROLE_START_TAG = "<|role_start|>human<|role_end|>"
-BOT_ROLE_START_TAG = "<|role_start|>bot<|role_end|>"
-text = f"{HUMAN_ROLE_START_TAG}write a python function of quick sort.{BOT_ROLE_START_TAG}"
-inputs = tokenizer(text, return_tensors='pt', padding=True, add_special_tokens=False).to("cuda")
-outputs = model.generate(
-        inputs=inputs["input_ids"],
-        attention_mask=inputs["attention_mask"],
         max_new_tokens=512,
         top_p=0.95,
-        temperature=0.1,
-        do_sample=True,
-        eos_token_id=tokenizer.eos_token_id,
-        pad_token_id=tokenizer.pad_token_id
-    )
-gen_text = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[1]:], skip_special_tokens=True)
-print(gen_text)
 ```
-## MD5
-我们发现模型文件可能会在传输过程中损坏，使用前请检查文件MD5值。
-| 模型文件                           | MD5值                            |
-|:---------------------------------|:--------------------------------:|
-| pytorch_model-00001-of-00007.bin | 8d544b1bcb3449934184d4141137329c |
-| pytorch_model-00002-of-00007.bin | 9d5dbb30911e48a42fb6d0fcabb322a4 |
-| pytorch_model-00003-of-00007.bin | b0d4aecee0457d9332005a187e1fffed |
-| pytorch_model-00004-of-00007.bin | 5c7e002de5eab77d0194a2b0f6de0c24 |
-| pytorch_model-00005-of-00007.bin | d22a511aa26b5b17117b665a877490ab |
-| pytorch_model-00006-of-00007.bin | a5c28ac277fac07d16dd66537e54d109 |
-| pytorch_model-00007-of-00007.bin | a967e2c6195477b7407089c0bffa2d53 |

 ---
+license: other
 tasks:
 - code-generation
 ---
+# Model Card for CodeFuse-Mixtral-8x7B
+![logo](LOGO.jpg)
 [[中文]](#chinese)    [[English]](#english)
 ## Model Description
+CodeFuse-Mixtral-8x7B is a Code-LLM finetuned by QLoRA on multiple code-related tasks on the base model Mixtral-8x7B-v0.1（Mixture of Experts）.
 <br>
 ## News and Updates
+🔥🔥🔥 2024-01-12 CodeFuse-DeepSeek-33B has been released, achieving a pass@1 (greedy decoding) score of 78.65% on HumanEval.
+🔥🔥🔥 2024-01-12 CodeFuse-Mixtral-8x7B has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval, which is a 15% increase compared to Mixtral-8x7b's 40%.
+🔥🔥 2023-11-10 CodeFuse-CodeGeeX2-6B has been released, achieving a pass@1 (greedy decoding) score of 45.12% on HumanEval, which is a 9.22% increase compared to CodeGeeX2 35.9%.
+🔥🔥 2023-10-20 CodeFuse-QWen-14B technical documentation has been released. For those interested, please refer to the CodeFuse article on our WeChat official account via the provided link.(https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw)
+🔥🔥 2023-10-16 CodeFuse-QWen-14B has been released, achieving a pass@1 (greedy decoding) score of 48.78% on HumanEval, which is a 16% increase compared to Qwen-14b's 32.3%.
+🔥🔥 2023-09-27 CodeFuse-StarCoder-15B has been released, achieving a pass@1 (greedy decoding) score of 54.9% on HumanEval, which is a 21% increase compared to StarCoder's 33.6%.
+🔥🔥 2023-09-26 We are pleased to announce the release of the [4-bit quantized version](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits/summary) of [CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary). Despite the quantization process, the model still achieves a remarkable 73.8% accuracy (greedy decoding) on the HumanEval pass@1 metric.
+🔥🔥 2023-09-11 [CodeFuse-CodeLlama34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary) has achieved 74.4% of pass@1 (greedy decoding) on HumanEval, which is SOTA results for openspurced LLMs at present.
 <br>
 + If you wish to see a demo of the model, you can visit ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨
+<br>
 ## Performance
+### Code
 | Model                       | HumanEval(pass@1) |  Date   |
 |:----------------------------|:-----------------:|:-------:|
+| **CodeFuse-DeepSeek-33B**   |     **78.65%**    | 2024.01 |
+| **CodeFuse-Mixtral-8x7B**   |     **56.10%**    | 2024.01 |
+| **CodeFuse-CodeLlama-34B**  |     74.4%      | 2023.9  |
+|**CodeFuse-CodeLlama-34B-4bits** |     73.8%  |  2023.9 |
+| **CodeFuse-StarCoder-15B**  |     54.9%         | 2023.9  |
+| **CodeFuse-QWen-14B**       |     48.78%        | 2023.10 |
+| **CodeFuse-CodeGeeX2-6B**   |     45.12%        | 2023.11 |
 | WizardCoder-Python-34B-V1.0 |       73.2%       | 2023.8  |
 | GPT-4(zero-shot)            |       67.0%       | 2023.3  |
 | PanGu-Coder2 15B            |       61.6%       | 2023.8  |
 | GPT-3.5(zero-shot)          |       48.1%       | 2022.11 |
 | OctoCoder                   |       46.2%       | 2023.8  |
 | StarCoder-15B               |       33.6%       | 2023.5  |
+| Qwen-14b                    |       32.3%       | 2023.10 |
+### NLP
+![NLP Performance Radar](codefuse-deepseek-33b-nlp.png)
 <br>
 * python>=3.8
 * pytorch>=2.0.0
+* transformers>=4.33.2
 * Sentencepiece
 * CUDA 11.4
   <br>
 ##  Inference String Format
 The inference string is a concatenated string formed by combining conversation data(system, human and bot contents) in the training data format.  It is used as input during the inference process.
+Here are examples of prompts used to request the model:
+**Multi-Round with System Prompt:**
 ```python
 """
+<s>system
+System instruction
+<s>human
+Human 1st round input
+<s>bot
+Bot 1st round output</s>
+<s>human
+Human 2nd round input
+<s>bot
+Bot 2nd round output</s>
 ...
 ...
 ...
+<s>human
+Human nth round input
+<s>bot
 """
 ```
+**Single-Round without System Prompt:**
+```python
+"""
+<s>human
+User prompt...
+<s>bot
+"""
+```
+In this format, the system section is optional and the conversation can be either single-turn or multi-turn. When applying inference, you always make your input string end with "\<s\>bot" to ask the model generating answers.
+For example, the format used to infer HumanEval is like the following:
 ```
+<s>human
+# language: Python
+from typing import List
+def separate_paren_groups(paren_string: str) -> List[str]:
+    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
+    separate those group into separate strings and return the list of those.
+    Separate groups are balanced (each open brace is properly closed) and not nested within each other
+    Ignore any spaces in the input string.
+    >>> separate_paren_groups('( ) (( )) (( )( ))')
+    ['()', '(())', '(()())']
+    """
+<s>bot
+```
+Specifically, we also add the Programming Language Tag (e.g. "```# language: Python```" for Python) used by CodeGeex models.
+## Quickstart
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+def load_model_tokenizer(model_path):
+    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=False, legacy=False)
+    tokenizer.eos_token = "</s>"
+    tokenizer.pad_token = "</s>"
+    tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eos_token)
+    tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
+    tokenizer.padding_side = "left"
+    model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto',torch_dtype=torch.bfloat16, trust_remote_code=True)
+    return model, tokenizer
+HUMAN_ROLE_START_TAG = "<s>human\n"
+BOT_ROLE_START_TAG = "<s>bot\n"
+text_list = [f'{HUMAN_ROLE_START_TAG}Write a QuickSort program\n#Python\n{BOT_ROLE_START_TAG}']
+model, tokenizer = load_model_tokenizer("codefuse-ai/CodeFuse-DeepSeek-33B")
+inputs = tokenizer(text_list, return_tensors='pt', padding=True, add_special_tokens=False).to('cuda')
+input_ids = inputs["input_ids"]
+attention_mask = inputs["attention_mask"]
+generation_config = GenerationConfig(
+        eos_token_id=tokenizer.eos_token_id,
+        pad_token_id=tokenizer.pad_token_id,
+        temperature=0.1,
         max_new_tokens=512,
+        num_return_sequences=1,
+        num_beams=1,
         top_p=0.95,
+        do_sample=False
+)
+outputs = model.generate(
+        inputs= input_ids,
+        attention_mask=attention_mask,
+        **generation_config.to_dict()
+)
+gen_text = tokenizer.batch_decode(outputs[:, input_ids.shape[1]:], skip_special_tokens=True)
+print(gen_text[0])
 ```
 <a id="chinese"></a>
 ## 模型简介
+CodeFuse-DeepSeek-33B 是一个通过QLoRA对基座模型DeepSeek-Coder-33B进行多代码任务微调而得到的代码大模型。
 <br>
 ## 新闻
+🔥🔥🔥 2024-01-12 CodeFuse-DeepSeek-33B模型发布，模型在HumanEval pass@1指标为78.65% (贪婪解码)。
+🔥🔥🔥 2023-11-10 开源了CodeFuse-CodeGeeX2-6B模型，在HumanEval pass@1(greedy decoding)上可以达到48.12%, 比CodeGeeX2提高了9.22%的代码能力（HumanEval）
+🔥🔥🔥 2023-10-20 公布了CodeFuse-QWen-14B技术文档，感兴趣详见微信公众号CodeFuse文章：https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw
+🔥🔥🔥 2023-10-16开源了CodeFuse-QWen-14B模型，在HumanEval pass@1(greedy decoding)上可以达到48.78%, 比Qwen-14b提高了16%的代码能力（HumanEval）
+🔥🔥🔥 2023-09-27开源了CodeFuse-StarCoder-15B模型，在HumanEval pass@1(greedy decoding)上可以达到54.9%, 比StarCoder提高了21%的代码能力（HumanEval）
+🔥🔥🔥 2023-09-26 [CodeFuse-CodeLlama-34B 4bits](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits/summary)量化版本发布，量化后模型在HumanEval pass@1指标为73.8% (贪婪解码)。
+🔥🔥🔥 2023-09-11 [CodeFuse-CodeLlama-34B](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B/summary)发布，HumanEval pass@1指标达到74.4% (贪婪解码), 为当前开源SOTA。
 <br>
 ## 代码社区
+**大本营**： 🏡 https://github.com/codefuse-ai （**请支持我们的项目Star🌟 + Fork🚀 + Watch👀**）
 + 如果您想自己微调该模型，可以访问 ✨[MFTCoder](https://github.com/codefuse-ai/MFTCoder)✨✨
 + 如果您想观看该模型示例，可以访问 ✨[CodeFuse Demo](https://github.com/codefuse-ai/codefuse)✨✨
+<br>
+## 评测表现
+### 代码
 | 模型                          | HumanEval(pass@1) |   日期    |
 |:----------------------------|:-----------------:|:-------:|
+| **CodeFuse-CodeLlama-34B**  |     74.4%      | 2023.9  |
+|**CodeFuse-CodeLlama-34B-4bits** |     73.8%  |  2023.9 |
 | WizardCoder-Python-34B-V1.0 |       73.2%       | 2023.8  |
 | GPT-4(zero-shot)            |       67.0%       | 2023.3  |
 | PanGu-Coder2 15B            |       61.6%       | 2023.8  |
 | GPT-3.5(zero-shot)          |       48.1%       | 2022.11 |
 | OctoCoder                   |       46.2%       | 2023.8  |
 | StarCoder-15B               |       33.6%       | 2023.5  |
+| Qwen-14b               |       32.3%       | 2023.10  |
+| **CodeFuse-StarCoder-15B**  |     54.9%     | 2023.9  |
+| **CodeFuse-QWen-14B**       |     48.78%     | 2023.8 |
+| **CodeFuse-CodeGeeX2-6B**   |     45.12%    | 2023.11 |
+| **CodeFuse-DeepSeek-33B**.  |     **78.65%**    | 2024.01 |
 ## Requirements
 * python>=3.8
 * pytorch>=2.0.0
+* transformers>=4.33.2
+* Sentencepiece
 * CUDA 11.4
 <br>
 ## 推理数据格式
+推理数据为模型在训练数据格式下拼接的字符串形式，它也是推理时输入prompt拼接的方式. 下面分别是带系统提示的多轮会话格式和不带系统提示的单轮会话格式：
+**带System提示的多轮会话格式:**
 ```python
 """
+<s>system
+System instruction
+<s>human
+Human 1st round input
+<s>bot
+Bot 1st round output<｜end▁of▁sentence｜>
+<s>human
+Human 2nd round input
+<s>bot
+Bot 2nd round output<｜end▁of▁sentence｜>
 ...
 ...
 ...
+<s>human
+Human nth round input
+<s>bot
+"""
+```
+**不带System提示的单轮会话格式:**
+```python
+"""
+<s>human
+User prompt...
+<s>bot
 """
 ```
+在这个格式中，System提示是可选的（按需设定），支持单轮会话也支持多轮会话。推理时，请确保拼接的prompt字符串以"\<s\>bot\n"结尾，引导模型生成回答。
+例如，推理HumanEval数据时使用的格式如下所示：
 ```python
+<s>human
+# language: Python
+from typing import List
+def separate_paren_groups(paren_string: str) -> List[str]:
+    """ Input to this function is a string containing multiple groups of nested parentheses. Your goal is to
+    separate those group into separate strings and return the list of those.
+    Separate groups are balanced (each open brace is properly closed) and not nested within each other
+    Ignore any spaces in the input string.
+    >>> separate_paren_groups('( ) (( )) (( )( ))')
+    ['()', '(())', '(()())']
+    """
+<s>bot
+```
+特别地，我们也使用了CodeGeeX系列模型采用的编程语言区分标签（例如，对于Python语言，我们会使用"```# language: Python```"）。
+## 快速使用
+```python
+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig
+def load_model_tokenizer(model_path):
+    tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True, use_fast=False, legacy=False)
+    tokenizer.eos_token = "<｜end▁of▁sentence｜>"
+    tokenizer.pad_token = "<｜end▁of▁sentence｜>"
+    tokenizer.eos_token_id = tokenizer.convert_tokens_to_ids(tokenizer.eos_token)
+    tokenizer.pad_token_id = tokenizer.convert_tokens_to_ids(tokenizer.pad_token)
+    tokenizer.padding_side = "left"
+    model = AutoModelForCausalLM.from_pretrained(model_path, device_map='auto',torch_dtype=torch.bfloat16, trust_remote_code=True)
+    return model, tokenizer
+HUMAN_ROLE_START_TAG = "<s>human\n"
+BOT_ROLE_START_TAG = "<s>bot\n"
+text_list = [f'{HUMAN_ROLE_START_TAG}请写一个快排程序\n#Python\n{BOT_ROLE_START_TAG}']
+model, tokenizer = load_model_tokenizer("codefuse-ai/CodeFuse-Mixtral-8x7b")
+inputs = tokenizer(text_list, return_tensors='pt', padding=True, add_special_tokens=False).to('cuda')
+input_ids = inputs["input_ids"]
+attention_mask = inputs["attention_mask"]
+generation_config = GenerationConfig(
+        eos_token_id=tokenizer.eos_token_id,
+        pad_token_id=tokenizer.pad_token_id,
+        temperature=0.2,
         max_new_tokens=512,
+        num_return_sequences=1,
+        num_beams=1,
         top_p=0.95,
+        do_sample=False
+)
+outputs = model.generate(
+        inputs= input_ids,
+        attention_mask=attention_mask,
+        **generation_config.to_dict()
+)
+gen_text = tokenizer.batch_decode(outputs[:, input_ids.shape[1]:], skip_special_tokens=True)
+print(gen_text[0])
 ```