mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-13 22:18:36 +08:00
Signed-off-by: Stanley Sun <stsun@nvidia.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| README.md | ||
OpenTelemetry Integration Guide
This guide explains how to setup OpenTelemetry tracing in TensorRT-LLM to monitor and debug your LLM inference services.
Install OpenTelemetry
Install the required OpenTelemetry packages:
pip install \
'opentelemetry-sdk' \
'opentelemetry-api' \
'opentelemetry-exporter-otlp' \
'opentelemetry-semantic-conventions-ai'
Start Jaeger
You can start Jaeger with Docker:
docker run --rm --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:1.57.0
Or run the jaeger-all-in-one(.exe) executable from the binary distribution archives:
jaeger-all-in-one --collector.zipkin.host-port=:9411
Setup environment variables and run TensorRT-LLM
Set up the environment variables:
export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger)
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
export OTEL_SERVICE_NAME="trtllm-server"
Then run TensorRT-LLM with OpenTelemetry, and make sure to set return_perf_metrics to true in the model configuration:
trtllm-serve models/Qwen3-8B/ --otlp_traces_endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
Send requests and find traces in Jaeger
You can send a request to the server and view the traces in Jaeger UI. The traces should be visible under the service name "trtllm-server".
Configuration for Disaggregated Serving
For disaggregated serving scenarios, the configuration for ctx server and gen server remains the same as the standalone model. For the proxy, you can configure it as follows:
# disagg_config.yaml
hostname: 127.0.0.1
port: 8000
backend: pytorch
context_servers:
num_instances: 1
urls:
- "127.0.0.1:8001"
generation_servers:
num_instances: 1
urls:
- "127.0.0.1:8002"
otlp_config:
otlp_traces_endpoint: "grpc://0.0.0.0:4317"