VLLM 0.11.2 AutoTune Bug: Missing Import In TorchInductor
Hey everyone! We've stumbled upon an interesting issue in vLLM version 0.11.2 that we wanted to share and discuss. It seems that when you're using the AutoTune feature with the TorchInductor backend and set your compile_sizes to be greater than 1, you might run into a rather cryptic RuntimeError. Specifically, the error message is: Failed to run autotuning code block: name 'get_raw_stream' is not defined.
This usually pops up when vLLM is trying to optimize certain Triton kernels generated by TorchInductor. After a bit of digging, it appears that the code generated by TorchInductor is trying to use a function called get_raw_stream before it's actually imported or defined within that specific code block. It's like trying to use a tool before taking it out of the toolbox! We've even seen snippets in the torch logs that look like this:
with torch.cuda._DeviceGuard(7):
stream7 = get_raw_stream(7)
from torch._C import _cuda_getCurrentRawStream as get_raw_stream
stream7 = get_raw_stream(7)
This clearly shows the function being called before its definition is made available. It's a classic race condition or order-of-operations hiccup within the compilation process.
Reproducing the Issue: A Simple Test Case
To help illustrate and reproduce this bug, we've put together a straightforward Python script. You can try this out with the Mixtral model, which seems to be particularly susceptible to this issue when these specific conditions are met. Here’s the code:
from vllm import LLM, SamplingParams
from vllm.config import CompilationConfig
prompts = [
"Hello, my name is",
"The president of the United States is",
"The capital of France is",
"The future of AI is",
]
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
llm = LLM(model="mistralai/Mixtral-8x7B-v0.1", compilation_config=CompilationConfig(compile_sizes=[2]),)
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
When you run this script with vLLM 0.11.2 and the specified model, you'll likely hit the RuntimeError we’ve described. The full error trace is quite extensive, as you can see in the details below, but the core issue boils down to that missing get_raw_stream definition during the TorchInductor compilation phase.
Full Error Trace
Click to expand the full error trace
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) Exception ignored in: <function ExactWeakKeyDictionary.__setitem__.<locals>.<lambda> at 0x7fb312a02480>
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) Traceback (most recent call last):
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/utils.py", line 988, in <lambda>
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) self.refs[idx] = weakref.ref(key, lambda ref: self._remove_id(idx))
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 682, in signal_handler
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) raise SystemExit()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) SystemExit:
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] WorkerProc hit an exception.
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] Traceback (most recent call last):
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] output = func(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 429, in compile_or_warm_up_model
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] cuda_graph_memory_bytes = self.model_runner.capture_model()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4201, in capture_model
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._capture_cudagraphs(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4298, in _capture_cudagraphs
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._dummy_run(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return func(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3855, in _dummy_run
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] outputs = self.model(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mixtral.py", line 604, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] hidden_states = self.model(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.forward(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mixtral.py", line 351, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] def forward(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return fn(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise e
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "<eval_with_key>.66", line 177, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] submod_2 = self.submod_2(getitem_3, s72, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_4, l_self_modules_layers_modules_0_modules_block_sparse_moe_modules_gate_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_4 = l_self_modules_layers_modules_0_modules_block_sparse_moe_modules_gate_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 107, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] entry.runnable = self.vllm_backend.compiler_manager.compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 233, in compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_graph, handle = self.compiler.compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 232, in compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_graph = standalone_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return standalone_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_fn = compile_fx(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return compile_fx(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2695, in compile_fx
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 990, in _compile_fx_inner
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1505, in codegen_and_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2319, in compile_to_module
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._compile_to_module()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2325, in _compile_to_module
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2264, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.scheduler.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5205, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._codegen_partitions()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5345, in _codegen_partitions
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._codegen(partition)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5430, in _codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_extern_call(node)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 4588, in codegen_extern_call
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] node.codegen(V.graph.wrapper_code)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/ir.py", line 6671, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] wrapper.codegen_subgraph_with_flattened_outputs(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 3314, in codegen_subgraph_with_flattened_outputs
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_subgraph_common(subgraph)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 3307, in codegen_subgraph_common
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] subgraph_code, _ = subgraph.graph.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2271, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] result = self.wrapper_code.generate(self.is_inference)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1552, in generate
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._generate(is_inference)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1615, in _generate
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.generate_and_run_autotune_block()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1695, in generate_and_run_autotune_block
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: name 'get_raw_stream' is not defined
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] Traceback (most recent call last):
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 810, in worker_busy_loop
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] output = func(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 429, in compile_or_warm_up_model
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] cuda_graph_memory_bytes = self.model_runner.capture_model()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4201, in capture_model
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._capture_cudagraphs(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 4298, in _capture_cudagraphs
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._dummy_run(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return func(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3855, in _dummy_run
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] outputs = self.model(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mixtral.py", line 604, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] hidden_states = self.model(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 399, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return TorchCompileWithNoGuardsWrapper.__call__(self, *args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/wrapper.py", line 152, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.forward(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/mixtral.py", line 351, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] def forward(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_dynamo/eval_frame.py", line 1044, in _fn
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return fn(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/caching.py", line 53, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.optimized_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 837, in call_wrapped
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._wrapped_call(self, *args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 413, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise e
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/fx/graph_module.py", line 400, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return super(self.cls, obj).__call__(*args, **kwargs) # type: ignore[misc]
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1786, in _call_impl
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "<eval_with_key>.66", line 177, in forward
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] submod_2 = self.submod_2(getitem_3, s72, l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_, l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_, getitem_4, l_self_modules_layers_modules_0_modules_block_sparse_moe_modules_gate_parameters_weight_, l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_, l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_, l_positions_, l_self_modules_layers_modules_0_modules_self_attn_modules_rotary_emb_buffers_cos_sin_cache_); getitem_3 = l_self_modules_layers_modules_0_modules_self_attn_modules_o_proj_parameters_weight_ = l_self_modules_layers_modules_0_modules_post_attention_layernorm_parameters_weight_ = getitem_4 = l_self_modules_layers_modules_0_modules_block_sparse_moe_modules_gate_parameters_weight_ = l_self_modules_layers_modules_1_modules_input_layernorm_parameters_weight_ = l_self_modules_layers_modules_1_modules_self_attn_modules_qkv_proj_parameters_weight_ = None
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/cuda_graph.py", line 126, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self.runnable(*args, **kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/piecewise_backend.py", line 107, in __call__
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] entry.runnable = self.vllm_backend.compiler_manager.compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/backends.py", line 233, in compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_graph, handle = self.compiler.compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/compiler_interface.py", line 232, in compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_graph = standalone_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/__init__.py", line 422, in standalone_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return standalone_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/standalone_compile.py", line 252, in standalone_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_fn = compile_fx(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2413, in compile_fx
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return compile_fx(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 2695, in compile_fx
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise e.remove_dynamo_frames() from None # see TORCHDYNAMO_VERBOSE=1
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 990, in _compile_fx_inner
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise InductorError(e, currentframe()).with_traceback(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 974, in _compile_fx_inner
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] mb_compiled_graph = fx_codegen_and_compile(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1695, in fx_codegen_and_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return scheme.codegen_and_compile(gm, example_inputs, inputs_to_check, graph_kwargs)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_fx.py", line 1505, in codegen_and_compile
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] compiled_module = graph.compile_to_module()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2319, in compile_to_module
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._compile_to_module()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2325, in _compile_to_module
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_with_cpp_wrapper() if self.cpp_wrapper else self.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2264, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.scheduler.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5205, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._codegen_partitions()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5345, in _codegen_partitions
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self._codegen(partition)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 5430, in _codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_extern_call(node)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/scheduler.py", line 4588, in codegen_extern_call
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] node.codegen(V.graph.wrapper_code)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/ir.py", line 6671, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] wrapper.codegen_subgraph_with_flattened_outputs(
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 3314, in codegen_subgraph_with_flattened_outputs
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.codegen_subgraph_common(subgraph)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 3307, in codegen_subgraph_common
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] subgraph_code, _ = subgraph.graph.codegen()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/graph.py", line 2271, in codegen
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] result = self.wrapper_code.generate(self.is_inference)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1552, in generate
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] return self._generate(is_inference)
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1615, in _generate
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] self.generate_and_run_autotune_block()
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/codegen/wrapper.py", line 1695, in generate_and_run_autotune_block
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] raise RuntimeError(f"Failed to run autotuning code block: {e}") from e
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815] torch._inductor.exc.InductorError: RuntimeError: Failed to run autotuning code block: name 'get_raw_stream' is not defined
(EngineCore_DP0 pid=26353) (Worker_TP7 pid=26373) ERROR 12-17 13:32:44 [multiproc_executor.py:815]
(EngineCore_DP0 pid=26353) ERROR 12-17 13:32:44 [multiproc_executor.py:230] Worker proc VllmWorker-3 died unexpectedly, shutting down executor.
(EngineCore_DP0 pid=26353) Process EngineCore_DP0:
(EngineCore_DP0 pid=26353) Traceback (most recent call last):
(EngineCore_DP0 pid=26353) File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=26353) self.run()
(EngineCore_DP0 pid=26353) File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=26353) self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
(EngineCore_DP0 pid=26353) raise e
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 833, in run_engine_core
(EngineCore_DP0 pid=26353) engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=26353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 606, in __init__
(EngineCore_DP0 pid=26353) super().__init__(
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 109, in __init__
(EngineCore_DP0 pid=26353) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=26353) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 247, in _initialize_kv_caches
(EngineCore_DP0 pid=26353) self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/abstract.py", line 116, in initialize_from_config
(EngineCore_DP0 pid=26353) self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 358, in collective_rpc
(EngineCore_DP0 pid=26353) return aggregate(get_response())
(EngineCore_DP0 pid=26353) ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=26353) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 341, in get_response
(EngineCore_DP0 pid=26353) raise RuntimeError(
(EngineCore_DP0 pid=26353) RuntimeError: Worker failed with error 'RuntimeError: Failed to run autotuning code block: name 'get_raw_stream' is not defined', please check the stack trace above for the root cause
A Temporary Workaround
While we're working on a more permanent solution, we've found a temporary fix. By adding a small piece of code to vllm/env_override.py, we can patch the get_raw_stream function if it's needed and not available. This involves importing the necessary function from torch._C and making it globally available. You can enable this workaround by setting the environment variable VLLM_PATCH_GET_RAW_STREAM=1:
def _patch_get_raw_stream_if_needed():
"""Workaround for TorchInductor autotune using get_raw_stream() without defining it.
Enable by setting VLLM_PATCH_GET_RAW_STREAM=1 in the environment.
"""
if os.getenv("VLLM_PATCH_GET_RAW_STREAM", "0") != "1":
return
try:
import builtins
from torch._C import _cuda_getCurrentRawStream as _get_raw_stream
except Exception:
return
if not hasattr(builtins, "get_raw_stream"):
builtins.get_raw_stream = _get_raw_stream
_patch_get_raw_stream_if_needed()
This little hack seems to do the trick for now, allowing the autotuning process to proceed without hitting that import error. We've filed this issue to keep track of it and are hoping for some insights from the Torch team to get a more robust, long-term fix in place.
Environment Details
Just to provide some context, here are the key details about the environment where we observed this issue:
- vLLM Version: 0.11.2
- Model:
mistralai/Mixtral-8x7B-v0.1 - Torch Version: 2.9.0+cu129
- CUDA Version: 12.9
- Python Version: 3.12.12
- Operating System: Ubuntu 22.04.5 LTS
- GPUs: NVIDIA H200 (8x)
- vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled
We're keen to hear if others have encountered this or have any thoughts on the root cause or potential solutions. Your feedback is always appreciated!
Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
For more information on large language models and their optimization, you might find these resources helpful:
- PyTorch Official Documentation: https://pytorch.org/
- NVIDIA Developer Blog: https://developer.nvidia.com/blog
- Hugging Face: https://huggingface.co/