Update README.md
Browse files
README.md
CHANGED
|
@@ -169,20 +169,6 @@ Hello! I'm Qwen, a large language model developed by Alibaba Cloud. While I don'
|
|
| 169 |
|
| 170 |
# Model Quality
|
| 171 |
|
| 172 |
-
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
| 173 |
-
|
| 174 |
-
Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
|
| 175 |
-
|
| 176 |
-
## baseline
|
| 177 |
-
```Shell
|
| 178 |
-
lm_eval --model hf --model_args pretrained=Qwen3/Qwen3-4B --tasks mmlu --device cuda:0 --batch_size auto
|
| 179 |
-
```
|
| 180 |
-
|
| 181 |
-
## int8 dynamic activation and int4 weight quantization (8da4w)
|
| 182 |
-
```Shell
|
| 183 |
-
lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
|
| 184 |
-
```
|
| 185 |
-
|
| 186 |
| Benchmark | | |
|
| 187 |
|----------------------------------|----------------|---------------------------|
|
| 188 |
| | Qwen3-4B | Qwen3-4B-8da4w |
|
|
@@ -197,9 +183,26 @@ lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu -
|
|
| 197 |
| mgsm_en_cot_en | 30.40 | 29.20 |
|
| 198 |
| **Math** | | |
|
| 199 |
| gsm8k | 84.76 | 82.87 |
|
| 200 |
-
| leaderboard_math_hard (v3) | 48.19 |
|
| 201 |
-
| **Overall** | 55.08 |
|
| 202 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 203 |
|
| 204 |
# Exporting to ExecuTorch
|
| 205 |
|
|
|
|
| 169 |
|
| 170 |
# Model Quality
|
| 171 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
| Benchmark | | |
|
| 173 |
|----------------------------------|----------------|---------------------------|
|
| 174 |
| | Qwen3-4B | Qwen3-4B-8da4w |
|
|
|
|
| 183 |
| mgsm_en_cot_en | 30.40 | 29.20 |
|
| 184 |
| **Math** | | |
|
| 185 |
| gsm8k | 84.76 | 82.87 |
|
| 186 |
+
| leaderboard_math_hard (v3) | 48.19 | 44.94 |
|
| 187 |
+
| **Overall** | 55.08 | 52.01 |
|
| 188 |
|
| 189 |
+
<details>
|
| 190 |
+
<summary> Reproduce Model Quality Results </summary>
|
| 191 |
+
|
| 192 |
+
We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
|
| 193 |
+
|
| 194 |
+
Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
|
| 195 |
+
|
| 196 |
+
## baseline
|
| 197 |
+
```Shell
|
| 198 |
+
lm_eval --model hf --model_args pretrained=Qwen3/Qwen3-4B --tasks mmlu --device cuda:0 --batch_size auto
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
## int8 dynamic activation and int4 weight quantization (8da4w)
|
| 202 |
+
```Shell
|
| 203 |
+
lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
|
| 204 |
+
```
|
| 205 |
+
</details>
|
| 206 |
|
| 207 |
# Exporting to ExecuTorch
|
| 208 |
|