Update README.md
Browse files
README.md
CHANGED
|
@@ -295,6 +295,7 @@ Note the result of latency (benchmark_latency) is in seconds, and serving (bench
|
|
| 295 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 296 |
<details>
|
| 297 |
<summary> Reproduce Model Performance Results </summary>
|
|
|
|
| 298 |
## Setup
|
| 299 |
|
| 300 |
Get vllm source code:
|
|
|
|
| 295 |
Int4 weight only is optimized for batch size 1 and short input and output token length, please stay tuned for models optimized for larger batch sizes or longer token length.
|
| 296 |
<details>
|
| 297 |
<summary> Reproduce Model Performance Results </summary>
|
| 298 |
+
|
| 299 |
## Setup
|
| 300 |
|
| 301 |
Get vllm source code:
|