mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

[TRTLLM-9762] [doc] Update documents for GB300 NVL72 (#9987 )

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

2025-12-14 19:30:28 -08:00

4.4 KiB

Raw Blame History

Wide-EP SLURM Benchmark Scripts

This directory contains configuration files and utilities for benchmarking TensorRT-LLM Wide Expert Parallelism (Wide-EP) performance on SLURM-managed clusters.

Overview

The Wide-EP benchmarking infrastructure leverages the disaggregated serving benchmark framework to evaluate MoE model performance with expert parallelism at scale. This directory provides:

Configuration templates for Wide-EP deployments (config.yaml)
Post-processing utilities for benchmark analysis (process_gen_iterlog.py)

Core Implementation

The core SLURM submission and execution logic is implemented in examples/disaggregated/slurm/benchmark/. The scripts in that directory handle:

Job submission to SLURM clusters
Multi-node distributed execution
Worker initialization and coordination
Benchmark execution and result collection

Files in This Directory

`config.yaml`

Example configuration file for Wide-EP benchmarks. Key sections include:

SLURM Configuration: Cluster-specific settings (partition, account, job parameters)
Benchmark Mode: Testing parameters (concurrency, sequence lengths, streaming mode)
Hardware Configuration: GPU topology and server counts
Environment: Container images, model paths, and environment variables
Worker Configuration: Detailed settings for generation and context workers, including:
- Parallelism settings (TP, EP, PP)
- MoE configuration with load balancer settings
- CUDA graph and KV cache configurations
- Speculative decoding parameters

See the inline comments in config.yaml for detailed parameter descriptions.

`process_gen_iterlog.py`

Post-processing script that analyzes benchmark iteration logs to generate performance reports. This script:

Parses generation worker iteration logs
Computes throughput and latency statistics
Generates summary reports for benchmark results

Usage

Prerequisites

Before running benchmarks, ensure you have:

SLURM Cluster Access: Valid account and partition allocation
Container Environment:
- NVIDIA Container Toolkit configured
- Required device mappings (e.g., /dev/nvidia-caps-imex-channels for GB200/GB300 NVL72, /dev/gdrdrv for GDRCopy)
Model Files: Checkpoint files accessible from all cluster nodes
Configuration: Updated config.yaml with your cluster-specific settings

Configuration Setup

Copy and customize the example configuration:

cp config.yaml my_benchmark_config.yaml

Update the following required fields in my_benchmark_config.yaml:
- slurm.partition: Your SLURM partition name
- slurm.account: Your SLURM account
- environment.container_image: Path to your TensorRT-LLM container
- environment.model_path: Path to your model checkpoint
- environment.work_dir: Working directory for benchmark outputs
- environment.container_mount: Mount paths for the container
Adjust hardware configuration to match your setup:
- hardware.gpus_per_node: GPUs available per node
- hardware.num_ctx_servers: Number of context processing servers
- hardware.num_gen_servers: Number of generation servers

Running Benchmarks

Submit a benchmark job using the submit.py script from the disaggregated benchmark directory:

# Navigate to the benchmark submission directory
cd ../../disaggregated/slurm/benchmark/

# Submit the job with your configuration
python3 submit.py -c ../../../wide_ep/slurm_scripts/my_benchmark_config.yaml

The script will:

Validate your configuration
Submit a SLURM job with the specified parameters
Launch distributed workers across the allocated nodes
Execute the benchmark workload
Collect results in the specified working directory

Monitoring and Results

After submission, monitor your job:

# Check job status
squeue -u $USER

# View job output (replace <job_id> with your SLURM job ID)
tail -f slurm-<job_id>.out

# Check worker logs in the working directory
ls <work_dir>/logs/

Benchmark results will be saved in your configured work_dir, including:

Iteration logs from generation and context workers
Performance metrics and throughput statistics
System logs and error reports

Post-Processing Results

Process generation iteration logs to extract performance metrics:

python3 process_gen_iterlog.py <path_to_gen_iter_log>

4.4 KiB Raw Blame History