pytorch
/

Qwen3-4B-INT8-INT4

Text Generation

text-generation-inference

Model card Files Files and versions

jerryzh168 commited on May 22

Commit

c10a6f4

·

verified ·

1 Parent(s): 6099e00

Update README.md

Files changed (1) hide show

README.md +19 -16

README.md CHANGED Viewed

@@ -169,20 +169,6 @@ Hello! I'm Qwen, a large language model developed by Alibaba Cloud. While I don'
 # Model Quality
-We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
-Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
-## baseline
-```Shell
-lm_eval --model hf --model_args pretrained=Qwen3/Qwen3-4B --tasks mmlu --device cuda:0 --batch_size auto
-```
-## int8 dynamic activation and int4 weight quantization (8da4w)
-```Shell
-lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
-```
 | Benchmark                        |                |                           |
 |----------------------------------|----------------|---------------------------|
 |                                  | Qwen3-4B       | Qwen3-4B-8da4w            |
@@ -197,9 +183,26 @@ lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu -
 | mgsm_en_cot_en                   | 30.40          | 29.20                     |
 | **Math**                         |                |                           |
 | gsm8k                            | 84.76          | 82.87                     |
-| leaderboard_math_hard (v3)       | 48.19          | TODO                      |
-| **Overall**                      | 55.08          | TODO                      |
 # Exporting to ExecuTorch

 # Model Quality
 | Benchmark                        |                |                           |
 |----------------------------------|----------------|---------------------------|
 |                                  | Qwen3-4B       | Qwen3-4B-8da4w            |
 | mgsm_en_cot_en                   | 30.40          | 29.20                     |
 | **Math**                         |                |                           |
 | gsm8k                            | 84.76          | 82.87                     |
+| leaderboard_math_hard (v3)       | 48.19          | 44.94                     |
+| **Overall**                      | 55.08          | 52.01                     |
+<details>
+<summary> Reproduce Model Quality Results </summary>
+We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
+Need to install lm-eval from source: https://github.com/EleutherAI/lm-evaluation-harness#install
+## baseline
+```Shell
+lm_eval --model hf --model_args pretrained=Qwen3/Qwen3-4B --tasks mmlu --device cuda:0 --batch_size auto
+```
+## int8 dynamic activation and int4 weight quantization (8da4w)
+```Shell
+lm_eval --model hf --model_args pretrained=pytorch/Qwen3-4B-8da4w --tasks mmlu --device cuda:0 --batch_size auto
+```
+</details>
 # Exporting to ExecuTorch