Bases: AttentionMetadataBuilder[M], ABC
Source code in vllm/v1/attention/backends/mamba_attn.py
  class-attribute  ¶
 cudagraph_support: AttentionCGSupport = (
    UNIFORM_SINGLE_TOKEN_DECODE
)
 instance-attribute  ¶
 decode_cudagraph_max_bs = min(
    max_num_seqs, max_cudagraph_capture_size
)
 instance-attribute  ¶
 state_indices_tensor = empty(
    (decode_cudagraph_max_bs,), dtype=int32, device=device
)
 
 __init__(
    kv_cache_spec: AttentionSpec,
    layer_names: list[str],
    vllm_config: VllmConfig,
    device: device,
)
Source code in vllm/v1/attention/backends/mamba_attn.py
  
 build_for_cudagraph_capture(
    common_attn_metadata: CommonAttentionMetadata,
) -> M
This method builds the metadata for full cudagraph capture. Currently, only decode is supported for full cudagraphs with Mamba.