Im getting an error when running this on vllm/vllm-openai:v0.11.0

#1
by derek-thomas - opened
ubuntu@server:~/projects/vlm-exploration$ docker run --gpus all --rm -p 8000:8000   -v ~/.cache/huggingface:/root/.cache/huggingface   vllm/vllm-openai:v0.11.0   --model OpenGVLab/InternVL3_5-8B-Flash   --served-model-name InternVL3_5-8B-Flash   --max-model-len 16384   --trust_remote_code
INFO 10-04 21:11:48 [__init__.py:216] Automatically detected platform cuda.
(APIServer pid=1) INFO 10-04 21:11:52 [api_server.py:1839] vLLM API server version 0.11.0
(APIServer pid=1) INFO 10-04 21:11:52 [utils.py:233] non-default args: {'model': 'OpenGVLab/InternVL3_5-8B-Flash', 'trust_remote_code': True, 'max_model_len': 16384, 'served_model_name': ['InternVL3_5-8B-Flash']}
(APIServer pid=1) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=1) A new version of the following files was downloaded from https://huggingface.co/OpenGVLab/InternVL3_5-8B-Flash:
(APIServer pid=1) - configuration_intern_vit.py
(APIServer pid=1) . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
(APIServer pid=1) INFO 10-04 21:12:02 [model.py:547] Resolved architecture: InternVLChatModel
(APIServer pid=1) `torch_dtype` is deprecated! Use `dtype` instead!
(APIServer pid=1) INFO 10-04 21:12:02 [model.py:1510] Using max model len 16384
(APIServer pid=1) INFO 10-04 21:12:02 [scheduler.py:205] Chunked prefill is enabled with max_num_batched_tokens=8192.
INFO 10-04 21:12:06 [__init__.py:216] Automatically detected platform cuda.
(EngineCore_DP0 pid=78) INFO 10-04 21:12:08 [core.py:644] Waiting for init message from front-end.
(EngineCore_DP0 pid=78) INFO 10-04 21:12:08 [core.py:77] Initializing a V1 LLM engine (v0.11.0) with config: model='OpenGVLab/InternVL3_5-8B-Flash', speculative_config=None, tokenizer='OpenGVLab/InternVL3_5-8B-Flash', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=16384, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=''), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=InternVL3_5-8B-Flash, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={"level":3,"debug_dump_path":"","cache_dir":"","backend":"","custom_ops":[],"splitting_ops":["vllm.unified_attention","vllm.unified_attention_with_output","vllm.mamba_mixer2","vllm.mamba_mixer","vllm.short_conv","vllm.linear_attention","vllm.plamo2_mamba_mixer","vllm.gdn_attention","vllm.sparse_attn_indexer"],"use_inductor":true,"compile_sizes":[],"inductor_compile_config":{"enable_auto_functionalized_v2":false},"inductor_passes":{},"cudagraph_mode":[2,1],"use_cudagraph":true,"cudagraph_num_of_warmups":1,"cudagraph_capture_sizes":[512,504,496,488,480,472,464,456,448,440,432,424,416,408,400,392,384,376,368,360,352,344,336,328,320,312,304,296,288,280,272,264,256,248,240,232,224,216,208,200,192,184,176,168,160,152,144,136,128,120,112,104,96,88,80,72,64,56,48,40,32,24,16,8,4,2,1],"cudagraph_copy_inputs":false,"full_cuda_graph":false,"use_inductor_graph_partition":false,"pass_config":{},"max_capture_size":512,"local_cache_dir":null}
(EngineCore_DP0 pid=78) W1004 21:12:11.207000 78 torch/utils/cpp_extension.py:2425] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation. 
(EngineCore_DP0 pid=78) W1004 21:12:11.207000 78 torch/utils/cpp_extension.py:2425] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=78) INFO 10-04 21:12:12 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=78) INFO 10-04 21:12:12 [topk_topp_sampler.py:55] Using FlashInfer for top-p & top-k sampling.
(EngineCore_DP0 pid=78) WARNING 10-04 21:12:12 [__init__.py:2227] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_DP0 pid=78) WARNING 10-04 21:12:12 [processing.py:1089] InternVLProcessor did not return `BatchFeature`. Make sure to match the behaviour of `ProcessorMixin` when implementing custom processors.
(EngineCore_DP0 pid=78) WARNING 10-04 21:12:12 [__init__.py:2227] The following intended overrides are not keyword args and will be dropped: {'truncation'}
(EngineCore_DP0 pid=78) Token indices sequence length is longer than the specified maximum sequence length for this model (16734 > 14588). Running this sequence through the model will result in indexing errors
(EngineCore_DP0 pid=78) INFO 10-04 21:12:12 [gpu_model_runner.py:2602] Starting to load model OpenGVLab/InternVL3_5-8B-Flash...
(EngineCore_DP0 pid=78) INFO 10-04 21:12:13 [gpu_model_runner.py:2634] Loading model from scratch...
(EngineCore_DP0 pid=78) INFO 10-04 21:12:13 [layer.py:444] MultiHeadAttention attn_backend: _Backend.FLASH_ATTN, use_upstream_fa: False
(EngineCore_DP0 pid=78) INFO 10-04 21:12:13 [cuda.py:366] Using Flash Attention backend on V1 engine.
(EngineCore_DP0 pid=78) INFO 10-04 21:12:13 [weight_utils.py:392] Using model weights format ['*.safetensors']
Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards:  25% Completed | 1/4 [00:00<00:01,  1.76it/s]
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.59it/s]
(EngineCore_DP0 pid=78) Process EngineCore_DP0:
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708] EngineCore failed to start.
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708] Traceback (most recent call last):
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self._init_executor()
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self.collective_rpc("load_model")
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     return func(*args, **kwargs)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self.model = model_loader.load_model(
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     self.load_weights(model, model_config)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     loaded_weights = model.load_weights(
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]                      ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1414, in load_weights
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     return loader.load_weights(weights)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708]     raise ValueError(msg)
(EngineCore_DP0 pid=78) ERROR 10-04 21:12:15 [core.py:708] ValueError: There is no module or parameter named 'gating' in InternVLChatModel
(EngineCore_DP0 pid=78) Traceback (most recent call last):
(EngineCore_DP0 pid=78)   File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=78)     self.run()
(EngineCore_DP0 pid=78)   File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=78)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 712, in run_engine_core
(EngineCore_DP0 pid=78)     raise e
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 699, in run_engine_core
(EngineCore_DP0 pid=78)     engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=78)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 498, in __init__
(EngineCore_DP0 pid=78)     super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 83, in __init__
(EngineCore_DP0 pid=78)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=78)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 54, in __init__
(EngineCore_DP0 pid=78)     self._init_executor()
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 55, in _init_executor
(EngineCore_DP0 pid=78)     self.collective_rpc("load_model")
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/executor/uniproc_executor.py", line 83, in collective_rpc
(EngineCore_DP0 pid=78)     return [run_method(self.driver_worker, method, args, kwargs)]
(EngineCore_DP0 pid=78)             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 3122, in run_method
(EngineCore_DP0 pid=78)     return func(*args, **kwargs)
(EngineCore_DP0 pid=78)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
(EngineCore_DP0 pid=78)     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2635, in load_model
(EngineCore_DP0 pid=78)     self.model = model_loader.load_model(
(EngineCore_DP0 pid=78)                  ^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 50, in load_model
(EngineCore_DP0 pid=78)     self.load_weights(model, model_config)
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 264, in load_weights
(EngineCore_DP0 pid=78)     loaded_weights = model.load_weights(
(EngineCore_DP0 pid=78)                      ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/internvl.py", line 1414, in load_weights
(EngineCore_DP0 pid=78)     return loader.load_weights(weights)
(EngineCore_DP0 pid=78)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 294, in load_weights
(EngineCore_DP0 pid=78)     autoloaded_weights = set(self._load_module("", self.module, weights))
(EngineCore_DP0 pid=78)                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=78)   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/utils.py", line 280, in _load_module
(EngineCore_DP0 pid=78)     raise ValueError(msg)
(EngineCore_DP0 pid=78) ValueError: There is no module or parameter named 'gating' in InternVLChatModel
Loading safetensors checkpoint shards:  50% Completed | 2/4 [00:01<00:01,  1.09it/s]
(EngineCore_DP0 pid=78) 
[rank0]:[W1004 21:12:16.659336685 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1)   File "<frozen runpy>", line 198, in _run_module_as_main
(APIServer pid=1)   File "<frozen runpy>", line 88, in _run_code
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1953, in <module>
(APIServer pid=1)     uvloop.run(run_server(args))
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
(APIServer pid=1)     return __asyncio.run(
(APIServer pid=1)            ^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1)     return runner.run(main)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1)     return self._loop.run_until_complete(task)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=1)     return await main
(APIServer pid=1)            ^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1884, in run_server
(APIServer pid=1)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1902, in run_server_worker
(APIServer pid=1)     async with build_async_engine_client(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 180, in build_async_engine_client
(APIServer pid=1)     async with build_async_engine_client_from_engine_args(
(APIServer pid=1)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=1)     return await anext(self.gen)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 225, in build_async_engine_client_from_engine_args
(APIServer pid=1)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/utils/__init__.py", line 1572, in inner
(APIServer pid=1)     return fn(*args, **kwargs)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 207, in from_vllm_config
(APIServer pid=1)     return cls(
(APIServer pid=1)            ^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 134, in __init__
(APIServer pid=1)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
(APIServer pid=1)     return AsyncMPClient(*client_args)
(APIServer pid=1)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 769, in __init__
(APIServer pid=1)     super().__init__(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 448, in __init__
(APIServer pid=1)     with launch_core_engines(vllm_config, executor_class,
(APIServer pid=1)          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1)   File "/usr/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=1)     next(self.gen)
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 732, in launch_core_engines
(APIServer pid=1)     wait_for_engine_startup(
(APIServer pid=1)   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 785, in wait_for_engine_startup
(APIServer pid=1)     raise RuntimeError("Engine core initialization failed. "
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

I have the same error ValueError: There is no module or parameter named 'gating' in InternVLChatModel

OpenGVLab org

Hi, thanks for your feedback! Our model supports deployment with lmdeploy. https://github.com/InternLM/lmdeploy

Sign up or log in to comment