tencent
/

Hunyuan-A13B-Instruct

@@ -227,9 +227,7 @@ We provide a pre-built Docker image containing vLLM 0.8.5 with full support for
 - To get started:
 ```
-docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
-or
-docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
 ```
 - Download Model file:
@@ -247,8 +245,7 @@ docker run --rm  --ipc=host \
         --net=host \
         --gpus=all \
         -it \
-        -e VLLM_USE_V1=0 \
-        --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \
@@ -265,8 +262,7 @@ docker run --rm  --ipc=host \
         --net=host \
         --gpus=all \
         -it \
-        -e VLLM_USE_V1=0 \
-        --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \

 - To get started:
 ```
+docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
 ```
 - Download Model file:
         --net=host \
         --gpus=all \
         -it \
+        --entrypoint python3 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1 \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \
         --net=host \
         --gpus=all \
         -it \
+        --entrypoint python3 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1 \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \

README_CN.md CHANGED Viewed

@@ -180,14 +180,12 @@ print(response)
 ### Docker 镜像
-我们提供了一个预构建的 Docker 镜像，其中包含了支持本模型的 vLLM 0.8.5。当前官方 vLLM 正在持续开发中。**注意：该镜像要求使用 CUDA 12.8 版本。**
 - 快速开始方式如下：
 ```
-docker pull docker.cnb.cool/tencent/hunyuan/hunyuan-a13b:hunyuan-moe-A13B-vllm
-或
-docker pull hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm
 ```
 - 下载模型文件：
@@ -203,8 +201,7 @@ docker run --rm  --ipc=host \
         --net=host \
         --gpus=all \
         -it \
-        -e VLLM_USE_V1=0 \
-        --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \
@@ -222,8 +219,7 @@ docker run --rm  --ipc=host \
         --net=host \
         --gpus=all \
         -it \
-        -e VLLM_USE_V1=0 \
-        --entrypoint python hunyuaninfer/hunyuan-a13b:hunyuan-moe-A13B-vllm \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \

 ### Docker 镜像
+我们提供了一个基于官方 vLLM 0.8.5 版本的 Docker 镜像方便快速部署和测试。**注意：该镜像要求使用 CUDA 12.4 版本。**
 - 快速开始方式如下：
 ```
+docker pull hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1
 ```
 - 下载模型文件：
         --net=host \
         --gpus=all \
         -it \
+        --entrypoint python3 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1 \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \
         --net=host \
         --gpus=all \
         -it \
+        --entrypoint python3 hunyuaninfer/hunyuan-infer-vllm-cuda12.4:v1 \
         -m vllm.entrypoints.openai.api_server \
         --host 0.0.0.0 \
         --tensor-parallel-size 4 \