Setup OpenTelemetry POC¶
Source https://github.com/vllm-project/vllm/tree/main/examples/online_serving/opentelemetry.
-  Install OpenTelemetry packages: 
-  Start Jaeger in a docker container: # From: https://www.jaegertracing.io/docs/1.57/getting-started/ docker run --rm --name jaeger \ -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \ -p 6831:6831/udp \ -p 6832:6832/udp \ -p 5778:5778 \ -p 16686:16686 \ -p 4317:4317 \ -p 4318:4318 \ -p 14250:14250 \ -p 14268:14268 \ -p 14269:14269 \ -p 9411:9411 \ jaegertracing/all-in-one:1.57
-  In a new shell, export Jaeger IP: export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger) export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317Then set vLLM's service name for OpenTelemetry, enable insecure connections to Jaeger and run vLLM: 
-  In a new shell, send requests with trace context from a dummy client 
-  Open Jaeger webui: http://localhost:16686/ In the search pane, select vllm-serverservice and hitFind Traces. You should get a list of traces, one for each request. 
-  Clicking on a trace will show its spans and their tags. In this demo, each trace has 2 spans. One from the dummy client containing the prompt text and one from vLLM containing metadata about the request.  
Exporter Protocol¶
OpenTelemetry supports either grpc or http/protobuf as the transport protocol for trace data in the exporter. By default, grpc is used. To set http/protobuf as the protocol, configure the OTEL_EXPORTER_OTLP_TRACES_PROTOCOL environment variable as follows:
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=http/protobuf
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=http://$JAEGER_IP:4318/v1/traces
vllm serve facebook/opt-125m --otlp-traces-endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
Instrumentation of FastAPI¶
OpenTelemetry allows automatic instrumentation of FastAPI.
-  Install the instrumentation library 
-  Run vLLM with opentelemetry-instrument
-  Send a request to vLLM and find its trace in Jaeger. It should contain spans from FastAPI. 
Example materials¶
dummy_client.py
# SPDX-License-Identifier: Apache-2.0
# SPDX-FileCopyrightText: Copyright contributors to the vLLM project
import requests
from opentelemetry.exporter.otlp.proto.grpc.trace_exporter import OTLPSpanExporter
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor, ConsoleSpanExporter
from opentelemetry.trace import SpanKind, set_tracer_provider
from opentelemetry.trace.propagation.tracecontext import TraceContextTextMapPropagator
trace_provider = TracerProvider()
set_tracer_provider(trace_provider)
trace_provider.add_span_processor(BatchSpanProcessor(OTLPSpanExporter()))
trace_provider.add_span_processor(BatchSpanProcessor(ConsoleSpanExporter()))
tracer = trace_provider.get_tracer("dummy-client")
url = "http://localhost:8000/v1/completions"
with tracer.start_as_current_span("client-span", kind=SpanKind.CLIENT) as span:
    prompt = "San Francisco is a"
    span.set_attribute("prompt", prompt)
    headers = {}
    TraceContextTextMapPropagator().inject(headers)
    payload = {
        "model": "facebook/opt-125m",
        "prompt": prompt,
        "max_tokens": 10,
        "n": 3,
        "use_beam_search": "true",
        "temperature": 0.0,
        # "stream": True,
    }
    response = requests.post(url, headers=headers, json=payload)
