Bases: ABC
Spec for an offloading connector
Source code in vllm/v1/kv_offload/spec.py
  instance-attribute  ¶
 offloaded_block_size = int(
    get("block_size", gpu_block_size)
)
 
 __init__(vllm_config: VllmConfig)
Source code in vllm/v1/kv_offload/spec.py
  abstractmethod  ¶
 get_handlers(
    kv_caches: dict[str, Tensor],
) -> Iterator[
    tuple[
        type[LoadStoreSpec],
        type[LoadStoreSpec],
        OffloadingHandler,
    ]
]
Get offloading handlers along with their respective src and dst types.
Parameters:
| Name | Type | Description | Default | 
|---|---|---|---|
| kv_caches | dict[str, Tensor] | A dictionary of layer_name -> gpu_kv_cache tensor. | required | 
Yields:
| Type | Description | 
|---|---|
| tuple[type[LoadStoreSpec], type[LoadStoreSpec], OffloadingHandler] | Tuples of (src_type, dst_type, offloading_handler). | 
Source code in vllm/v1/kv_offload/spec.py
  abstractmethod  ¶
 get_manager() -> OffloadingManager
Get an OffloadingManager that will be used by the scheduler-side offloading connector to track offloaded blocks and manage evictions.