Text Generation
Transformers
Safetensors
minimax_m2
conversational
custom_code
fp8
jiaxin commited on
Commit
c4075e3
·
1 Parent(s): 8abcad7

update README

Browse files
docs/sglang_deploy_guide_cn.md CHANGED
@@ -34,9 +34,12 @@
34
 
35
  建议在全新的 Python 环境中安装 SGLang。由于尚未 release,需要从源码手动编译:
36
  ```bash
37
- git clone https://github.com/sgl-project/sglang.git
38
  cd sglang
39
- uv pip install ./python --torch-backend=auto
 
 
 
40
  ```
41
 
42
  运行如下命令启动 SGLang 服务器,SGLang 会自动从 Huggingface 下载并缓存 MiniMax-M2 模型。
 
34
 
35
  建议在全新的 Python 环境中安装 SGLang。由于尚未 release,需要从源码手动编译:
36
  ```bash
37
+ git clone -b v0.5.4.post3 https://github.com/sgl-project/sglang.git
38
  cd sglang
39
+
40
+ # Install the python packages
41
+ pip install --upgrade pip
42
+ pip install -e "python"
43
  ```
44
 
45
  运行如下命令启动 SGLang 服务器,SGLang 会自动从 Huggingface 下载并缓存 MiniMax-M2 模型。
docs/vllm_deploy_guide.md CHANGED
@@ -42,6 +42,16 @@ uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
42
 
43
  Run the following command to start the vLLM server. vLLM will automatically download and cache the MiniMax-M2 model from Hugging Face.
44
 
 
 
 
 
 
 
 
 
 
 
45
  8-GPU deployment command:
46
 
47
  ```bash
@@ -49,8 +59,7 @@ SAFETENSORS_FAST_GPU=1 vllm serve \
49
  MiniMaxAI/MiniMax-M2 --trust-remote-code \
50
  --enable_expert_parallel --tensor-parallel-size 8 \
51
  --enable-auto-tool-choice --tool-call-parser minimax_m2 \
52
- --reasoning-parser minimax_m2_append_think \
53
- --compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
54
  ```
55
 
56
  ## Testing Deployment
@@ -83,6 +92,18 @@ export HF_ENDPOINT=https://hf-mirror.com
83
 
84
  This vLLM version is outdated. Please upgrade to the latest version.
85
 
 
 
 
 
 
 
 
 
 
 
 
 
86
  ## Getting Support
87
 
88
  If you encounter any issues while deploying the MiniMax model:
 
42
 
43
  Run the following command to start the vLLM server. vLLM will automatically download and cache the MiniMax-M2 model from Hugging Face.
44
 
45
+ 4-GPU deployment command:
46
+
47
+ ```bash
48
+ SAFETENSORS_FAST_GPU=1 vllm serve \
49
+ MiniMaxAI/MiniMax-M2 --trust-remote-code \
50
+ --tensor-parallel-size 4 \
51
+ --enable-auto-tool-choice --tool-call-parser minimax_m2 \
52
+ --reasoning-parser minimax_m2_append_think
53
+ ```
54
+
55
  8-GPU deployment command:
56
 
57
  ```bash
 
59
  MiniMaxAI/MiniMax-M2 --trust-remote-code \
60
  --enable_expert_parallel --tensor-parallel-size 8 \
61
  --enable-auto-tool-choice --tool-call-parser minimax_m2 \
62
+ --reasoning-parser minimax_m2_append_think
 
63
  ```
64
 
65
  ## Testing Deployment
 
92
 
93
  This vLLM version is outdated. Please upgrade to the latest version.
94
 
95
+ ### torch.AcceleratorError: CUDA error: an illegal memory access was encountered
96
+ Add `--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"` to the startup parameters to resolve this issue. For example:
97
+
98
+ ```bash
99
+ SAFETENSORS_FAST_GPU=1 vllm serve \
100
+ MiniMaxAI/MiniMax-M2 --trust-remote-code \
101
+ --enable_expert_parallel --tensor-parallel-size 8 \
102
+ --enable-auto-tool-choice --tool-call-parser minimax_m2 \
103
+ --reasoning-parser minimax_m2_append_think \
104
+ --compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
105
+ ```
106
+
107
  ## Getting Support
108
 
109
  If you encounter any issues while deploying the MiniMax model:
docs/vllm_deploy_guide_cn.md CHANGED
@@ -41,6 +41,16 @@ uv pip install vllm --extra-index-url https://wheels.vllm.ai/nightly
41
 
42
  运行如下命令启动 vLLM 服务器,vLLM 会自动从 Huggingface 下载并缓存 MiniMax-M2 模型。
43
 
 
 
 
 
 
 
 
 
 
 
44
  8 卡部署命令:
45
 
46
  ```bash
@@ -48,8 +58,7 @@ SAFETENSORS_FAST_GPU=1 vllm serve \
48
  MiniMaxAI/MiniMax-M2 --trust-remote-code \
49
  --enable_expert_parallel --tensor-parallel-size 8 \
50
  --enable-auto-tool-choice --tool-call-parser minimax_m2 \
51
- --reasoning-parser minimax_m2_append_think \
52
- --compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
53
  ```
54
 
55
  ## 测试部署
@@ -82,6 +91,18 @@ export HF_ENDPOINT=https://hf-mirror.com
82
 
83
  该 vLLM 版本过旧,请升级到最新版本。
84
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ## 获取支持
86
 
87
  如果在部署 MiniMax 模型过程中遇到任何问题:
 
41
 
42
  运行如下命令启动 vLLM 服务器,vLLM 会自动从 Huggingface 下载并缓存 MiniMax-M2 模型。
43
 
44
+ 4 卡部署命令:
45
+
46
+ ```bash
47
+ SAFETENSORS_FAST_GPU=1 vllm serve \
48
+ MiniMaxAI/MiniMax-M2 --trust-remote-code \
49
+ --tensor-parallel-size 4 \
50
+ --enable-auto-tool-choice --tool-call-parser minimax_m2 \
51
+ --reasoning-parser minimax_m2_append_think
52
+ ```
53
+
54
  8 卡部署命令:
55
 
56
  ```bash
 
58
  MiniMaxAI/MiniMax-M2 --trust-remote-code \
59
  --enable_expert_parallel --tensor-parallel-size 8 \
60
  --enable-auto-tool-choice --tool-call-parser minimax_m2 \
61
+ --reasoning-parser minimax_m2_append_think
 
62
  ```
63
 
64
  ## 测试部署
 
91
 
92
  该 vLLM 版本过旧,请升级到最新版本。
93
 
94
+ ### torch.AcceleratorError: CUDA error: an illegal memory access was encountered
95
+ 在启动参数添加 `--compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"` 可以解决。例如:
96
+
97
+ ```bash
98
+ SAFETENSORS_FAST_GPU=1 vllm serve \
99
+ MiniMaxAI/MiniMax-M2 --trust-remote-code \
100
+ --enable_expert_parallel --tensor-parallel-size 8 \
101
+ --enable-auto-tool-choice --tool-call-parser minimax_m2 \
102
+ --reasoning-parser minimax_m2_append_think \
103
+ --compilation-config "{\"cudagraph_mode\": \"PIECEWISE\"}"
104
+ ```
105
+
106
  ## 获取支持
107
 
108
  如果在部署 MiniMax 模型过程中遇到任何问题: