TensorRT-LLMs/tests/integration/defs/perf/disagg/README.md
fredricz-20070104 6a64cb4c71
[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-11-26 10:34:49 +08:00

240 lines
6.6 KiB
Markdown

# TensorRT-LLM Disaggregated Benchmark Framework
A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.
## Overview
This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.
## Key Features
- **YAML Configuration**: Each test has its own independent YAML configuration file
- **Automatic Test Discovery**: Tests are automatically discovered from the config directory structure
- **Default Metrics**: Built-in default metrics configuration for common test scenarios
- **GPU Filtering**: Automatically filters tests based on hardware compatibility
- **Flexible Override**: Override default configurations as needed for special cases
- **Test Categories**: Support for both performance (perf) and accuracy tests
- **Multiple Test Types**: Support for disagg (disaggregated) and wideep architectures
## Directory Structure
```
test_configs/
├── disagg/ # Disaggregated serving tests
│ ├── perf/ # Performance tests
│ └── accuracy/ # Accuracy tests (optional)
└── wideep/ # Wide-deep tests
├── perf/
└── accuracy/
```
## YAML Configuration
### Minimal Configuration Example
```yaml
metadata:
model_name: "deepseek-r1-fp4"
precision: "fp4"
supported_gpus: ["GB200"]
slurm:
partition: "<partition>"
account: "<account>"
job_time: "02:00:00"
benchmark:
mode: "e2e"
streaming: true
concurrency_list: "1 2 4 8 16 36"
input_length: 1024
output_length: 1024
hardware:
gpus_per_node: 4
num_ctx_servers: 1
num_gen_servers: 4
environment:
container_mount: "<container_mount>"
container_image: "<container_image>"
model_path: "<model_path>"
worker_config:
gen:
tensor_parallel_size: 8
moe_expert_parallel_size: 8
max_batch_size: 32
max_num_tokens: 32
max_seq_len: 2251
# ... other gen worker configs
ctx:
tensor_parallel_size: 4
moe_expert_parallel_size: 4
max_batch_size: 4
max_num_tokens: 4608
max_seq_len: 2251
# ... other ctx worker configs
```
### Custom Metrics (Optional)
Most tests use default metrics. To customize:
```yaml
benchmark:
metrics:
log_file: "custom_benchmark.log"
extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
metric_names: ["CUSTOM_METRIC"]
```
## GPU Support
Currently supports **OCI GB200** only. The framework is designed to support additional GPU types in the future.
All configurations must specify:
```yaml
metadata:
supported_gpus: ["GB200"]
```
## Configuration Validation
The framework validates configurations before execution:
1. **gen_max_tokens**: Must equal `gen_max_batch_size * (mtp_size + 1)` when MTP is enabled
2. **streaming**: Must be `true`
3. **max_seq_len**: Both ctx and gen must be > (input_length + output_length)
## Running Tests
### Run all tests
```bash
poetry run pytest --disagg test_disagg.py -s -vv
```
### Run from test list
```bash
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt
```
### Run specific tests
```bash
# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf
# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy
# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"
```
## Test Naming Convention
Tests are automatically named using the format:
```
{test_type}_{category}_{config_filename}
```
Example: `disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL`
## File Naming Convention
Configuration files should follow this format:
```
{model}_{benchmark_type}_{config_details}.yaml
```
Examples:
- `deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml`
- `deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml`
Where:
- `1k1k`, `8k1k`: Input/output lengths (1024/1024, 8192/1024)
- `ctx1_gen1`: Context and generation server counts
- `dep32` or `tep8`: Data parallel (dep) or tensor parallel (tep) configuration
- `bs32`: Batch size
- `eplb0`: Expert parallel load balancing slots
- `mtp0`: Multi-token prediction layers
- `ccb-NIXL` or `ccb-UCX`: Communication backend
## Key Configuration Fields
### Metadata
- `model_name`: Model identifier
- `precision`: Model precision (fp4, fp8, etc.)
- `supported_gpus`: List of compatible GPU types
### Benchmark
- `mode`: Benchmark mode (e2e, gen_only, ctx_only)
- `streaming`: Enable streaming (must be true)
- `input_length`, `output_length`: Sequence lengths
- `concurrency_list`: Concurrency levels to test
### Worker Config
- `tensor_parallel_size`: Tensor parallelism degree
- `moe_expert_parallel_size`: MoE expert parallelism
- `max_batch_size`: Maximum batch size
- `max_num_tokens`: Maximum tokens per batch
- `max_seq_len`: Maximum sequence length
- `speculative_config`: Multi-token prediction settings (optional)
## Test Output
Test results are saved to:
- Performance metrics: `{OUTPUT_PATH}/perf_script_test_results.csv`
- Test logs: `{OUTPUT_PATH}/disagg_benchmark_{timestamp}.log`
## Environment Variables
- `GPU_TYPE`: Current GPU type (default: GB200)
- `OUTPUT_PATH`: Directory for test results and logs
- `WORK_DIR`: Working directory for benchmark execution
- `DEBUG_MODE`: Enable debug mode (set to "1" to skip job submission)
- `DEBUG_JOB_ID`: Job ID to use in debug mode
## Debug Mode
For local testing without SLURM submission:
```bash
export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv
```
## Architecture
The framework consists of:
1. **ConfigLoader**: Scans and loads YAML configurations
2. **ConfigValidator**: Validates configuration correctness
3. **JobManager**: Handles SLURM job submission and monitoring
4. **LogParser**: Extracts metrics from benchmark logs
5. **TestCaseTracker**: Tracks test execution timing
6. **ResultSaver**: Saves results to CSV
## Benefits
- **Simple**: YAML-based configuration, no code changes needed
- **Maintainable**: Each test is a separate file
- **Flexible**: Override defaults only when needed
- **Scalable**: Easy to add new tests and models
- **Reliable**: Automatic validation before execution
- **Traceable**: Comprehensive logging and result tracking
## Adding New Tests
1. Create a new YAML file in `test_configs/{test_type}/{category}/`
2. Configure the test parameters
3. Run pytest - the test will be automatically discovered
No code changes required!
---
For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.