jerryzh168 commited on
Commit
32d06e3
·
verified ·
1 Parent(s): 428e083

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -10
README.md CHANGED
@@ -251,10 +251,10 @@ lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 -
251
 
252
  ## Results
253
 
254
- | Benchmark | | |
255
- |------------------|----------------|--------------------------------|
256
- | | microsoft/Phi-4-mini-instruct | pytorch/Phi-4-mini-instruct-AWQ-INT4 |
257
- | Peak Memory (GB) | 8.91 | 3.95 (55.67% reduction) |
258
 
259
 
260
 
@@ -311,15 +311,15 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
311
  # Model Performance
312
 
313
  ## Results (H100 machine)
314
- | Benchmark (Latency) | | |
315
- |----------------------------------|----------------|--------------------------|
316
- | | microsoft/Phi-4-mini-instruct | pytorch/Phi-4-mini-instruct-AWQ-INT4 |
317
- | latency (batch_size=1) | 1.60s | 1.37s (1.17x speedup) |
318
- | latency (batch_size=256) | 5.47s | 5.55s (0.98x speedup) |
319
-
320
 
321
  Note: it's expected that the awq-int4 checkpoint is slower when batch size is 256 since the problem is not memory bound but becomes compute bound when batch size is larger, while
322
  int4 weight only checkpoint is only expected to have speedup for memory bound situations.
 
323
 
324
  <details>
325
  <summary> Reproduce Model Performance Results </summary>
 
251
 
252
  ## Results
253
 
254
+ | Benchmark | | | |
255
+ |------------------|----------------|--------------------------------|--------------------------------|
256
+ | | microsoft/Phi-4-mini-instruct | pytorch/Phi-4-mini-instruct-INT4 | pytorch/Phi-4-mini-instruct-AWQ-INT4 |
257
+ | Peak Memory (GB) | 8.91 | 2.98 (67% reduction) | 3.95 (55.67% reduction) |
258
 
259
 
260
 
 
311
  # Model Performance
312
 
313
  ## Results (H100 machine)
314
+ | Benchmark (Latency) | | | |
315
+ |----------------------------------|----------------|--------------------------|--------------------------|
316
+ | | microsoft/Phi-4-mini-instruct | jerryzh168/Phi-4-mini-instruct-INT4 | pytorch/Phi-4-mini-instruct-AWQ-INT4
317
+ | latency (batch_size=1) | 1.60s | TODO | 1.37s (1.17x speedup) |
318
+ | latency (batch_size=256) | 5.47s | TODO | 5.55s (0.98x speedup) |
 
319
 
320
  Note: it's expected that the awq-int4 checkpoint is slower when batch size is 256 since the problem is not memory bound but becomes compute bound when batch size is larger, while
321
  int4 weight only checkpoint is only expected to have speedup for memory bound situations.
322
+ Note: we are comparing to jerryzh168/Phi-4-mini-instruct-INT4 which is a checkpoint for H100, since the AWQ-INT4 is for the new INT4 config that's optimized for H100.
323
 
324
  <details>
325
  <summary> Reproduce Model Performance Results </summary>