Configuration for distributed KV cache transfer.
Source code in vllm/config/kv_transfer.py
 | 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |  | 
 class-attribute instance-attribute  ¶
 enable_permute_local_kv: bool = False
Experiment feature flag to enable HND to NHD KV Transfer
 class-attribute instance-attribute  ¶
 engine_id: str | None = None
The engine id for KV transfers.
 class-attribute instance-attribute  ¶
 kv_buffer_device: str = 'cuda'
The device used by kv connector to buffer the KV cache. Choices are 'cuda' and 'cpu'.
 class-attribute instance-attribute  ¶
 kv_buffer_size: float = 1000000000.0
The buffer size for TorchDistributedConnector. Measured in number of bytes. Recommended value: 1e9 (about 1GB).
 class-attribute instance-attribute  ¶
 kv_connector: str | None = None
The KV connector for vLLM to transmit KV caches between vLLM instances.
 class-attribute instance-attribute  ¶
  any extra config that the connector may need.
 class-attribute instance-attribute  ¶
 kv_connector_module_path: str | None = None
The Python module path to dynamically load the KV connector from. Only supported in V1.
 class-attribute instance-attribute  ¶
 kv_ip: str = '127.0.0.1'
The KV connector ip, used to build distributed connection.
 class-attribute instance-attribute  ¶
 kv_parallel_size: int = 1
The number of parallel instances for KV cache transfer. For P2pNcclConnector, this should be 2.
 class-attribute instance-attribute  ¶
 kv_port: int = 14579
The KV connector port, used to build distributed connection.
 class-attribute instance-attribute  ¶
 kv_rank: int | None = None
The rank of this vLLM instance in the KV cache transfer. Typical value: 0 for prefill instance, 1 for decode instance. Currently only 1P1D is supported.
 class-attribute instance-attribute  ¶
 kv_role: KVRole | None = None
Whether this vLLM instance produces, consumes KV cache, or both. Choices are 'kv_producer', 'kv_consumer', and 'kv_both'.
 
  Source code in vllm/config/kv_transfer.py
  
 compute_hash() -> str
WARNING: Whenever a new field is added to this config, ensure that it is included in the factors list if it affects the computation graph.
Provide a hash that uniquely identifies all the configs that affect the structure of the computation graph from input ids/embeddings to the final hidden states, excluding anything before input ids/embeddings and after the final hidden states.