TensorRT-LLMs/examples/opentelemetry/README.md

# OpenTelemetry Integration Guide

This guide explains how to setup OpenTelemetry tracing in TensorRT-LLM to monitor and debug your LLM inference services.

## Install OpenTelemetry

Install the required OpenTelemetry packages:

```bash
pip install \
  'opentelemetry-sdk' \
  'opentelemetry-api' \
  'opentelemetry-exporter-otlp' \
  'opentelemetry-semantic-conventions-ai'
```

## Start Jaeger

You can start Jaeger with Docker:

```bash
docker run --rm --name jaeger \
  -e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
  -p 6831:6831/udp \
  -p 6832:6832/udp \
  -p 5778:5778 \
  -p 16686:16686 \
  -p 4317:4317 \
  -p 4318:4318 \
  -p 14250:14250 \
  -p 14268:14268 \
  -p 14269:14269 \
  -p 9411:9411 \
  jaegertracing/all-in-one:1.57.0
```

Or run the jaeger-all-in-one(.exe) executable from [the binary distribution archives](https://www.jaegertracing.io/download/):

```bash
jaeger-all-in-one --collector.zipkin.host-port=:9411
```

## Setup environment variables and run TensorRT-LLM

Set up the environment variables:

```bash
export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger)
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
export OTEL_SERVICE_NAME="trtllm-server"
```

Then run TensorRT-LLM with OpenTelemetry, and make sure to set `return_perf_metrics` to true in the model configuration:

```bash
trtllm-serve models/Qwen3-8B/ --otlp_traces_endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
```

## Send requests and find traces in Jaeger

You can send a request to the server and view the traces in [Jaeger UI](http://localhost:16686/).
The traces should be visible under the service name "trtllm-server".

## Configuration for Disaggregated Serving

For disaggregated serving scenarios, the configuration for ctx server and gen server remains the same as the standalone model. For the proxy, you can configure it as follows:

```yaml
# disagg_config.yaml
hostname: 127.0.0.1
port: 8000
backend: pytorch
context_servers:
  num_instances: 1
  urls:
    - "127.0.0.1:8001"
generation_servers:
  num_instances: 1
  urls:
    - "127.0.0.1:8002"
otlp_config:
  otlp_traces_endpoint: "grpc://0.0.0.0:4317"
```