vllm （docker）not support

#17

by 1q1q11 - opened Sep 18

Sep 18

When using the vllm/vllm-openai:v0.10.1.1 (latest) image to deploy tencent/Hunyuan-MT-7B, it prompts that the transformer needs to be updated. Why is the latest version incompatible?

feelatoff

9 days ago

•

edited 9 days ago

I managed to launch this model in vllm using the instructions from GitHub.

The problem is that this model requires transformers from a specific commit, so I used vllm version 0.10.0 and executed the command:

pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca

After that, I ran it with the recommended parameters:

python3 -m vllm.entrypoints.openai.api_server \
      --host 0.0.0.0 \
      --port 8000 \
      --trust-remote-code \
      --model tencent/Hunyuan-MT-7B \
      --tensor-parallel-size 1 \
      --dtype bfloat16 \
      --quantization experts_int8 \
      --served-model-name Hunyuan-MT-7B \

So I built custom docker image using Dockerfile

FROM vllm/vllm-openai:v0.10.0
RUN pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca

and command

docker build -t vllm/vllm-openai:v0.10.0-hunyuan .

And than launch with compose file

services:
  vllm:
    image: vllm/vllm-openai:v0.10.0-hunyuan
    runtime: nvidia
    ports:
      - "8000:8000"
    volumes:
      - ./models/:/root/.cache/huggingface/hub/
    environment:
      - TRANSFORMERS_OFFLINE=1
    command: >
      --host 0.0.0.0
      --port 8000
      --trust-remote-code
      --model tencent/Hunyuan-MT-7B
      --tensor-parallel-size 1
      --dtype bfloat16
      --quantization experts_int8
      --served-model-name Hunyuan-MT-7B
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment