TensorRT-LLMs/examples/opentelemetry/README.md
Stanley Sun 96cfdd8a72
[None][chore] Change trt-server to trtlllm-server in opentelemetry readme (#9173)
Signed-off-by: Stanley Sun <stsun@nvidia.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
2025-11-17 22:02:24 -08:00

86 lines
2.2 KiB
Markdown

# OpenTelemetry Integration Guide
This guide explains how to setup OpenTelemetry tracing in TensorRT-LLM to monitor and debug your LLM inference services.
## Install OpenTelemetry
Install the required OpenTelemetry packages:
```bash
pip install \
'opentelemetry-sdk' \
'opentelemetry-api' \
'opentelemetry-exporter-otlp' \
'opentelemetry-semantic-conventions-ai'
```
## Start Jaeger
You can start Jaeger with Docker:
```bash
docker run --rm --name jaeger \
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
-p 6831:6831/udp \
-p 6832:6832/udp \
-p 5778:5778 \
-p 16686:16686 \
-p 4317:4317 \
-p 4318:4318 \
-p 14250:14250 \
-p 14268:14268 \
-p 14269:14269 \
-p 9411:9411 \
jaegertracing/all-in-one:1.57.0
```
Or run the jaeger-all-in-one(.exe) executable from [the binary distribution archives](https://www.jaegertracing.io/download/):
```bash
jaeger-all-in-one --collector.zipkin.host-port=:9411
```
## Setup environment variables and run TensorRT-LLM
Set up the environment variables:
```bash
export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger)
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
export OTEL_SERVICE_NAME="trtllm-server"
```
Then run TensorRT-LLM with OpenTelemetry, and make sure to set `return_perf_metrics` to true in the model configuration:
```bash
trtllm-serve models/Qwen3-8B/ --otlp_traces_endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
```
## Send requests and find traces in Jaeger
You can send a request to the server and view the traces in [Jaeger UI](http://localhost:16686/).
The traces should be visible under the service name "trtllm-server".
## Configuration for Disaggregated Serving
For disaggregated serving scenarios, the configuration for ctx server and gen server remains the same as the standalone model. For the proxy, you can configure it as follows:
```yaml
# disagg_config.yaml
hostname: 127.0.0.1
port: 8000
backend: pytorch
context_servers:
num_instances: 1
urls:
- "127.0.0.1:8001"
generation_servers:
num_instances: 1
urls:
- "127.0.0.1:8002"
otlp_config:
otlp_traces_endpoint: "grpc://0.0.0.0:4317"
```