TensorRT-LLMs/examples/wide_ep/slurm_scripts/README.md
Kaiyu Xie 0788635d6c
[TRTLLM-9762] [doc] Update documents for GB300 NVL72 (#9987)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-12-14 19:30:28 -08:00

125 lines
4.4 KiB
Markdown

# Wide-EP SLURM Benchmark Scripts
This directory contains configuration files and utilities for benchmarking TensorRT-LLM Wide Expert Parallelism (Wide-EP) performance on SLURM-managed clusters.
## Overview
The Wide-EP benchmarking infrastructure leverages the [disaggregated serving benchmark framework](../../disaggregated/slurm/benchmark/) to evaluate MoE model performance with expert parallelism at scale. This directory provides:
- **Configuration templates** for Wide-EP deployments (`config.yaml`)
- **Post-processing utilities** for benchmark analysis (`process_gen_iterlog.py`)
### Core Implementation
The core SLURM submission and execution logic is implemented in [`examples/disaggregated/slurm/benchmark/`](../../disaggregated/slurm/benchmark/). The scripts in that directory handle:
- Job submission to SLURM clusters
- Multi-node distributed execution
- Worker initialization and coordination
- Benchmark execution and result collection
## Files in This Directory
### `config.yaml`
Example configuration file for Wide-EP benchmarks. Key sections include:
- **SLURM Configuration**: Cluster-specific settings (partition, account, job parameters)
- **Benchmark Mode**: Testing parameters (concurrency, sequence lengths, streaming mode)
- **Hardware Configuration**: GPU topology and server counts
- **Environment**: Container images, model paths, and environment variables
- **Worker Configuration**: Detailed settings for generation and context workers, including:
- Parallelism settings (TP, EP, PP)
- MoE configuration with load balancer settings
- CUDA graph and KV cache configurations
- Speculative decoding parameters
See the inline comments in [`config.yaml`](config.yaml) for detailed parameter descriptions.
### `process_gen_iterlog.py`
Post-processing script that analyzes benchmark iteration logs to generate performance reports. This script:
- Parses generation worker iteration logs
- Computes throughput and latency statistics
- Generates summary reports for benchmark results
## Usage
### Prerequisites
Before running benchmarks, ensure you have:
1. **SLURM Cluster Access**: Valid account and partition allocation
2. **Container Environment**:
- NVIDIA Container Toolkit configured
- Required device mappings (e.g., `/dev/nvidia-caps-imex-channels` for GB200/GB300 NVL72, `/dev/gdrdrv` for GDRCopy)
3. **Model Files**: Checkpoint files accessible from all cluster nodes
4. **Configuration**: Updated `config.yaml` with your cluster-specific settings
### Configuration Setup
1. Copy and customize the example configuration:
```bash
cp config.yaml my_benchmark_config.yaml
```
2. Update the following required fields in `my_benchmark_config.yaml`:
- `slurm.partition`: Your SLURM partition name
- `slurm.account`: Your SLURM account
- `environment.container_image`: Path to your TensorRT-LLM container
- `environment.model_path`: Path to your model checkpoint
- `environment.work_dir`: Working directory for benchmark outputs
- `environment.container_mount`: Mount paths for the container
3. Adjust hardware configuration to match your setup:
- `hardware.gpus_per_node`: GPUs available per node
- `hardware.num_ctx_servers`: Number of context processing servers
- `hardware.num_gen_servers`: Number of generation servers
### Running Benchmarks
Submit a benchmark job using the `submit.py` script from the disaggregated benchmark directory:
```bash
# Navigate to the benchmark submission directory
cd ../../disaggregated/slurm/benchmark/
# Submit the job with your configuration
python3 submit.py -c ../../../wide_ep/slurm_scripts/my_benchmark_config.yaml
```
The script will:
1. Validate your configuration
2. Submit a SLURM job with the specified parameters
3. Launch distributed workers across the allocated nodes
4. Execute the benchmark workload
5. Collect results in the specified working directory
### Monitoring and Results
After submission, monitor your job:
```bash
# Check job status
squeue -u $USER
# View job output (replace <job_id> with your SLURM job ID)
tail -f slurm-<job_id>.out
# Check worker logs in the working directory
ls <work_dir>/logs/
```
Benchmark results will be saved in your configured `work_dir`, including:
- Iteration logs from generation and context workers
- Performance metrics and throughput statistics
- System logs and error reports
### Post-Processing Results
Process generation iteration logs to extract performance metrics:
```bash
python3 process_gen_iterlog.py <path_to_gen_iter_log>
```