Bases: Attention
Source code in vllm/attention/layers/chunked_local_attention.py
  
 __init__(
    num_heads: int,
    head_size: int,
    scale: float,
    attention_chunk_size: int,
    num_kv_heads: int | None = None,
    alibi_slopes: list[float] | None = None,
    cache_config: CacheConfig | None = None,
    quant_config: QuantizationConfig | None = None,
    kv_sharing_target_layer_name: str | None = None,
    prefix: str = "",
)
Source code in vllm/attention/layers/chunked_local_attention.py
  
 get_kv_cache_spec(vllm_config: VllmConfig) -> KVCacheSpec
Source code in vllm/attention/layers/chunked_local_attention.py
  cached  ¶
 create_chunked_local_attention_backend(
    underlying_attn_backend: AttentionBackend,
    attention_chunk_size: int,
    block_size: int,
) -> type[AttentionBackend]