jerryzh168 commited on
Commit
c568eae
·
verified ·
1 Parent(s): 5f05229

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -7
README.md CHANGED
@@ -219,7 +219,7 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
219
  | Benchmark | | | |
220
  |----------------------------------|------------------------|-----------------------------|---------------------------------|
221
  | | google/gemma-3-12b-it | pytorch/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
222
- | mmlu_abstract_algebra | 43 | 41 | 42 |
223
 
224
 
225
  <details>
@@ -250,7 +250,7 @@ lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 -
250
  | Benchmark | | | |
251
  |----------------------------------|------------------------|-----------------------------|---------------------------------|
252
  | | google/gemma-3-12b-it | pytorch/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
253
- | Peak Memory (GB) | 24.50 | 8.68 (65% reduction) | TODO |
254
 
255
 
256
  <details>
@@ -308,12 +308,14 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
308
  ## Results (H100 machine)
309
 
310
 
311
- | Benchmark (Latency) | | | |
312
- |----------------------------------|------------------------|-----------------------------|---------------------------------|
313
- | | google/gemma-3-12b-it | pytorch/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
314
- | latency (batch_size=1) | 3.73s | TODO (TODO% reduction) | TODO |
315
- | latency (batch_size=256) | TODO | TODO (TODO% reduction) | TODO |
 
316
 
 
317
  <details>
318
  <summary> Reproduce Model Performance Results </summary>
319
 
 
219
  | Benchmark | | | |
220
  |----------------------------------|------------------------|-----------------------------|---------------------------------|
221
  | | google/gemma-3-12b-it | pytorch/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
222
+ | professional_law | TODO | 54.24 | TODO |
223
 
224
 
225
  <details>
 
250
  | Benchmark | | | |
251
  |----------------------------------|------------------------|-----------------------------|---------------------------------|
252
  | | google/gemma-3-12b-it | pytorch/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
253
+ | Peak Memory (GB) | 24.50 | 8.57 (65% reduction) | 12.71 (48% reduction) |
254
 
255
 
256
  <details>
 
308
  ## Results (H100 machine)
309
 
310
 
311
+ | Benchmark (Latency) | | | |
312
+ |----------------------------------|------------------------|--------------------------------|---------------------------------|
313
+ | | google/gemma-3-12b-it | jerryzh168/gemma-3-12b-it-INT4 | pytorch/gemma-3-12b-it-AWQ-INT4 |
314
+ | latency (batch_size=1) | 3.73s | 2.76 (1.35x speedup) | 2.76s (1.35x speedup) |
315
+ | latency (batch_size=256) | 13.63s | 14.32 (0.95x speedup) | 14.30s (0.95x speedup) |
316
+
317
 
318
+ Note: jerryzh168/gemma-3-12b-it-INT4 is the H100 optimized checkpoint for INT4
319
  <details>
320
  <summary> Reproduce Model Performance Results </summary>
321