mirror of
https://github.com/NVIDIA/TensorRT-LLM.git
synced 2026-01-14 06:27:45 +08:00
Signed-off-by: Stanley Sun <stsun@nvidia.com> Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
86 lines
2.2 KiB
Markdown
86 lines
2.2 KiB
Markdown
# OpenTelemetry Integration Guide
|
|
|
|
This guide explains how to setup OpenTelemetry tracing in TensorRT-LLM to monitor and debug your LLM inference services.
|
|
|
|
## Install OpenTelemetry
|
|
|
|
Install the required OpenTelemetry packages:
|
|
|
|
```bash
|
|
pip install \
|
|
'opentelemetry-sdk' \
|
|
'opentelemetry-api' \
|
|
'opentelemetry-exporter-otlp' \
|
|
'opentelemetry-semantic-conventions-ai'
|
|
```
|
|
|
|
## Start Jaeger
|
|
|
|
You can start Jaeger with Docker:
|
|
|
|
```bash
|
|
docker run --rm --name jaeger \
|
|
-e COLLECTOR_ZIPKIN_HOST_PORT=:9411 \
|
|
-p 6831:6831/udp \
|
|
-p 6832:6832/udp \
|
|
-p 5778:5778 \
|
|
-p 16686:16686 \
|
|
-p 4317:4317 \
|
|
-p 4318:4318 \
|
|
-p 14250:14250 \
|
|
-p 14268:14268 \
|
|
-p 14269:14269 \
|
|
-p 9411:9411 \
|
|
jaegertracing/all-in-one:1.57.0
|
|
```
|
|
|
|
Or run the jaeger-all-in-one(.exe) executable from [the binary distribution archives](https://www.jaegertracing.io/download/):
|
|
|
|
```bash
|
|
jaeger-all-in-one --collector.zipkin.host-port=:9411
|
|
```
|
|
|
|
## Setup environment variables and run TensorRT-LLM
|
|
|
|
Set up the environment variables:
|
|
|
|
```bash
|
|
export JAEGER_IP=$(docker inspect --format '{{ .NetworkSettings.IPAddress }}' jaeger)
|
|
export OTEL_EXPORTER_OTLP_TRACES_PROTOCOL=grpc
|
|
export OTEL_EXPORTER_OTLP_TRACES_ENDPOINT=grpc://$JAEGER_IP:4317
|
|
export OTEL_EXPORTER_OTLP_TRACES_INSECURE=true
|
|
export OTEL_SERVICE_NAME="trtllm-server"
|
|
```
|
|
|
|
Then run TensorRT-LLM with OpenTelemetry, and make sure to set `return_perf_metrics` to true in the model configuration:
|
|
|
|
```bash
|
|
trtllm-serve models/Qwen3-8B/ --otlp_traces_endpoint="$OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"
|
|
```
|
|
|
|
## Send requests and find traces in Jaeger
|
|
|
|
You can send a request to the server and view the traces in [Jaeger UI](http://localhost:16686/).
|
|
The traces should be visible under the service name "trtllm-server".
|
|
|
|
## Configuration for Disaggregated Serving
|
|
|
|
For disaggregated serving scenarios, the configuration for ctx server and gen server remains the same as the standalone model. For the proxy, you can configure it as follows:
|
|
|
|
```yaml
|
|
# disagg_config.yaml
|
|
hostname: 127.0.0.1
|
|
port: 8000
|
|
backend: pytorch
|
|
context_servers:
|
|
num_instances: 1
|
|
urls:
|
|
- "127.0.0.1:8001"
|
|
generation_servers:
|
|
num_instances: 1
|
|
urls:
|
|
- "127.0.0.1:8002"
|
|
otlp_config:
|
|
otlp_traces_endpoint: "grpc://0.0.0.0:4317"
|
|
```
|