vllm (docker)not support
#17
by
1q1q11
- opened
When using the vllm/vllm-openai:v0.10.1.1 (latest) image to deploy tencent/Hunyuan-MT-7B, it prompts that the transformer needs to be updated. Why is the latest version incompatible?
I managed to launch this model in vllm using the instructions from GitHub.
The problem is that this model requires transformers from a specific commit, so I used vllm version 0.10.0 and executed the command:
pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
After that, I ran it with the recommended parameters:
python3 -m vllm.entrypoints.openai.api_server \
--host 0.0.0.0 \
--port 8000 \
--trust-remote-code \
--model tencent/Hunyuan-MT-7B \
--tensor-parallel-size 1 \
--dtype bfloat16 \
--quantization experts_int8 \
--served-model-name Hunyuan-MT-7B \
So I built custom docker image using Dockerfile
FROM vllm/vllm-openai:v0.10.0
RUN pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
and command
docker build -t vllm/vllm-openai:v0.10.0-hunyuan .
And than launch with compose file
services:
vllm:
image: vllm/vllm-openai:v0.10.0-hunyuan
runtime: nvidia
ports:
- "8000:8000"
volumes:
- ./models/:/root/.cache/huggingface/hub/
environment:
- TRANSFORMERS_OFFLINE=1
command: >
--host 0.0.0.0
--port 8000
--trust-remote-code
--model tencent/Hunyuan-MT-7B
--tensor-parallel-size 1
--dtype bfloat16
--quantization experts_int8
--served-model-name Hunyuan-MT-7B
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]