module-attribute  ¶
 EMPTY_MODEL_RUNNER_OUTPUT = ModelRunnerOutput(
    req_ids=[],
    req_id_to_index={},
    sampled_token_ids=[],
    logprobs=None,
    prompt_logprobs_dict={},
    pooler_output=[],
    num_nans_in_logits=None,
)
 
  Bases: ABC
Source code in vllm/v1/outputs.py
  abstractmethod  ¶
 get_output() -> ModelRunnerOutput
Get the ModelRunnerOutput for this async output.
This is a blocking call that waits until the results are ready, which might involve copying device tensors to the host. This method should only be called once per AsyncModelRunnerOutput.
Source code in vllm/v1/outputs.py
  dataclass  ¶
 Source code in vllm/v1/outputs.py
   dataclass  ¶
 Source code in vllm/v1/outputs.py
  class-attribute instance-attribute  ¶
   class-attribute instance-attribute  ¶
 kv_connector_stats: KVConnectorStats | None = None
 
  Bases: NamedTuple
Source code in vllm/v1/outputs.py
  class-attribute instance-attribute  ¶
   
  Source code in vllm/v1/outputs.py
  
  Bases: NamedTuple
Source code in vllm/v1/outputs.py
  staticmethod  ¶
 empty_cpu(
    num_positions: int, num_tokens_per_position: int
) -> LogprobsTensors
Create empty LogprobsTensors on CPU.
Source code in vllm/v1/outputs.py
  
    dataclass  ¶
 Source code in vllm/v1/outputs.py
  class-attribute instance-attribute  ¶
 kv_connector_output: KVConnectorOutput | None = None
 class-attribute instance-attribute  ¶
   
 __init__(
    req_ids: list[str],
    req_id_to_index: dict[str, int],
    sampled_token_ids: list[list[int]],
    logprobs: LogprobsLists | None,
    prompt_logprobs_dict: dict[str, LogprobsTensors | None],
    pooler_output: list[Tensor | None],
    kv_connector_output: KVConnectorOutput | None = None,
    num_nans_in_logits: dict[str, int] | None = None,
) -> None