Text Generation
Transformers
Safetensors
minimax_m2
conversational
custom_code
fp8
prince-canuma commited on
Commit
dc84333
·
verified ·
1 Parent(s): 457945b

Add MLX example

Browse files
Files changed (1) hide show
  1. README.md +74 -0
README.md CHANGED
@@ -167,6 +167,80 @@ We look forward to your feedback and to collaborating with developers and resear
167
 
168
  Download the model from HuggingFace repository: https://huggingface.co/MiniMaxAI/MiniMax-M2. We recommend using the following inference frameworks (listed alphabetically) to serve the model:
169
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
170
  ### SGLang
171
 
172
  We recommend using [SGLang](https://docs.sglang.ai/) to serve MiniMax-M2. SGLang provides solid day-0 support for MiniMax-M2 model. Please refer to our [SGLang Deployment Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/sglang_deploy_guide.md) for more details, and thanks so much for our collaboration with the SGLang team.
 
167
 
168
  Download the model from HuggingFace repository: https://huggingface.co/MiniMaxAI/MiniMax-M2. We recommend using the following inference frameworks (listed alphabetically) to serve the model:
169
 
170
+ Here's an improved, polished, and professional version of your documentation with better structure, clarity, grammar, accuracy, and usability:
171
+
172
+
173
+ ### MLX
174
+
175
+ Run, serve, and fine-tune **MiniMax-M2** locally on your Mac using the **MLX** framework. This guide gets you up and running quickly.
176
+
177
+ > **Requirements**
178
+ > - Apple Silicon Mac (M3 Ultra or later)
179
+ > - **At least 256GB of unified memory (RAM)**
180
+
181
+
182
+ **Installation**
183
+
184
+ Install the `mlx-lm` package via pip:
185
+
186
+ ```bash
187
+ pip install mlx-lm
188
+ ```
189
+
190
+ **CLI**
191
+
192
+ Generate text directly from the terminal:
193
+
194
+ ```bash
195
+ mlx_lm.generate \
196
+ --model mlx-community/MiniMax-M2-4bit \
197
+ --prompt "How tall is Mount Everest?"
198
+ ```
199
+
200
+ > Add `--max-tokens 256` to control response length, or `--temp 0.7` for creativity.
201
+
202
+ **Python Script Example**
203
+
204
+ Use `mlx-lm` in your own Python scripts:
205
+
206
+ ```python
207
+ from mlx_lm import load, generate
208
+
209
+ # Load the quantized model
210
+ model, tokenizer = load("mlx-community/MiniMax-M2-4bit")
211
+
212
+ prompt = "Hello, how are you?"
213
+
214
+ # Apply chat template if available (recommended for chat models)
215
+ if tokenizer.chat_template is not None:
216
+ messages = [{"role": "user", "content": prompt}]
217
+ prompt = tokenizer.apply_chat_template(
218
+ messages,
219
+ tokenize=False,
220
+ add_generation_prompt=True
221
+ )
222
+
223
+ # Generate response
224
+ response = generate(
225
+ model,
226
+ tokenizer,
227
+ prompt=prompt,
228
+ max_tokens=256,
229
+ temp=0.7,
230
+ verbose=True
231
+ )
232
+
233
+ print(response)
234
+ ```
235
+
236
+ **Tips**
237
+ - **Model variants**: Check [Hugging Face](https://huggingface.co/collections/mlx-community/minimax-m2) for `MiniMax-M2-4bit`, `6bit`, `8bit`, or `bfloat16` versions.
238
+ - **Fine-tuning**: Use `mlx-lm.lora` for efficient parameter-efficient fine-tuning (PEFT).
239
+
240
+ **Resources**
241
+ - GitHub: [https://github.com/ml-explore/mlx-lm](https://github.com/ml-explore/mlx-lm)
242
+ - Models: [https://huggingface.co/mlx-community](https://huggingface.co/mlx-community)
243
+
244
  ### SGLang
245
 
246
  We recommend using [SGLang](https://docs.sglang.ai/) to serve MiniMax-M2. SGLang provides solid day-0 support for MiniMax-M2 model. Please refer to our [SGLang Deployment Guide](https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/docs/sglang_deploy_guide.md) for more details, and thanks so much for our collaboration with the SGLang team.