Update README.md
Browse files
README.md
CHANGED
|
@@ -19,8 +19,52 @@ Here are some of the optimized configurations we have added:
|
|
| 19 |
1. ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN.
|
| 20 |
2. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
|
| 21 |
|
|
|
|
| 22 |
You can see how to run examples with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
## Model Description
|
| 25 |
- Developed by: Microsoft
|
| 26 |
- Model type: ONNX
|
|
|
|
| 19 |
1. ONNX model for int4 CPU and Mobile: ONNX model for CPU and mobile using int4 quantization via RTN.
|
| 20 |
2. ONNX model for int4 CUDA and DML GPU devices using int4 quantization via RTN.
|
| 21 |
|
| 22 |
+
## Model Run
|
| 23 |
You can see how to run examples with ORT GenAI [here](https://github.com/microsoft/onnxruntime-genai/blob/main/examples/python/phi-3-tutorial.md)
|
| 24 |
|
| 25 |
+
For CPU:
|
| 26 |
+
|
| 27 |
+
```bash
|
| 28 |
+
# Download the model directly using the Hugging Face CLI
|
| 29 |
+
huggingface-cli download microsoft/Phi-4-mini-instruct-onnx/ --include Phi-4-mini-instruct-onnx/cpu_and_mobile/* --local-dir .
|
| 30 |
+
|
| 31 |
+
# Install the CPU package of ONNX Runtime GenAI
|
| 32 |
+
pip install onnxruntime-genai
|
| 33 |
+
|
| 34 |
+
# Please adjust the model directory (-m) accordingly
|
| 35 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
| 36 |
+
python phi3-qa.py -m cpu_and_mobile/cpu-int4-rtn-block-32-acc-level-4 -e cpu
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
For CUDA:
|
| 40 |
+
|
| 41 |
+
```bash
|
| 42 |
+
# Download the model directly using the Hugging Face CLI
|
| 43 |
+
huggingface-cli download onnxruntime/Phi-4-mini-instruct-onnx --include Phi-4-mini-instruct-onnx/gpu/* --local-dir .
|
| 44 |
+
|
| 45 |
+
# Install the CUDA package of ONNX Runtime GenAI
|
| 46 |
+
pip install onnxruntime-genai-cuda
|
| 47 |
+
|
| 48 |
+
# Please adjust the model directory (-m) accordingly
|
| 49 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
| 50 |
+
python phi3-qa.py -m gpu/gpu-int4-rtn-block-32 -e cuda
|
| 51 |
+
```
|
| 52 |
+
|
| 53 |
+
For DirectML:
|
| 54 |
+
|
| 55 |
+
```bash
|
| 56 |
+
# Download the model directly using the Hugging Face CLI
|
| 57 |
+
huggingface-cli download onnxruntime/Phi-4-mini-instruct-onnx --include Phi-4-mini-instruct-onnx/gpu/* --local-dir .
|
| 58 |
+
|
| 59 |
+
# Install the CUDA package of ONNX Runtime GenAI
|
| 60 |
+
pip install onnxruntime-genai-cuda
|
| 61 |
+
|
| 62 |
+
# Please adjust the model directory (-m) accordingly
|
| 63 |
+
curl https://raw.githubusercontent.com/microsoft/onnxruntime-genai/main/examples/python/phi3-qa.py -o phi3-qa.py
|
| 64 |
+
python phi3-qa.py -m gpu/gpu-int4-rtn-block-32 -e dml
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
|
| 68 |
## Model Description
|
| 69 |
- Developed by: Microsoft
|
| 70 |
- Model type: ONNX
|