Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -59,10 +59,9 @@ def benchmark_fn(f, *args, **kwargs):
 torchao.quantization.utils.recommended_inductor_config_setter()
 quantized_model = torch.compile(quantized_model, mode="max-autotune")
 print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_new_tokens=128))
 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
-```
 # Installing the nightly version to get most recent updates
 ```
@@ -119,7 +118,7 @@ python benchmarks/benchmark_latency.py --input-len 256 --output-len 256 --model
 # benchmark_serving
-We also benchmarked the throughput with real serving environment.
 ## baseline
 Server:

 torchao.quantization.utils.recommended_inductor_config_setter()
 quantized_model = torch.compile(quantized_model, mode="max-autotune")
 print(f"{save_to} model:", benchmark_fn(quantized_model.generate, **inputs, max_new_tokens=128))
+```
 # Model Quality
 We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) to evaluate the quality of the quantized model.
 # Installing the nightly version to get most recent updates
 ```
 # benchmark_serving
+We also benchmarked the throughput in a serving environment.
 ## baseline
 Server: