Configuration for KV event publishing.
Source code in vllm/config/kv_events.py
  class-attribute instance-attribute  ¶
 buffer_steps: int = 10000
The number of steps to cache for replay endpoint. Will only save events from the last N steps for the replay endpoint.
 class-attribute instance-attribute  ¶
 enable_kv_cache_events: bool = False
If True, enable KV cache events for tracking block storage and removal. Events can be published externally by zmq using the event publisher config.
 class-attribute instance-attribute  ¶
 endpoint: str = 'tcp://*:5557'
The zmq endpoint to use for publishing kv events.
 class-attribute instance-attribute  ¶
 hwm: int = 100000
The zmq high water mark for the event publisher. After queueing N events, events will start dropping if the consumer is not keeping up.
 class-attribute instance-attribute  ¶
 max_queue_size: int = 100000
The maximum number of events to queue while waiting for publishing.
 class-attribute instance-attribute  ¶
 publisher: Literal['null', 'zmq'] = Field(default=None)
The publisher to use for publishing kv events. Can be "null", "zmq".
 class-attribute instance-attribute  ¶
 replay_endpoint: str | None = None
The zmq endpoint to use for replaying kv events.