TensorRT-LLMs/tests/integration/defs/perf/disagg/README.md
yingguo-trt f8b2a8fd30
[None][chore] Support multiple job submission at the same time (#10492)
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
Co-authored-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2026-01-07 21:51:36 -05:00

377 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# TensorRT-LLM Disaggregated Benchmark Framework
A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.
## Overview
This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.
## Key Features
- **YAML Configuration**: Each test has its own independent YAML configuration file
- **Automatic Test Discovery**: Tests are automatically discovered from the config directory structure
- **Default Metrics**: Built-in default metrics configuration for common test scenarios
- **GPU Filtering**: Automatically filters tests based on hardware compatibility
- **Flexible Override**: Override default configurations as needed for special cases
- **Test Categories**: Support for both performance (perf) and accuracy tests
- **Multiple Test Types**: Support for disagg (disaggregated) and wideep architectures
## Directory Structure
```
test_configs/
├── disagg/ # Disaggregated serving tests
│ ├── perf/ # Performance tests
│ └── accuracy/ # Accuracy tests (optional)
└── wideep/ # Wide-deep tests
├── perf/
└── accuracy/
```
## YAML Configuration
### Minimal Configuration Example
```yaml
metadata:
model_name: "deepseek-r1-fp4"
precision: "fp4"
supported_gpus: ["GB200"]
slurm:
partition: "<partition>"
account: "<account>"
job_time: "02:00:00"
benchmark:
mode: "e2e"
streaming: true
concurrency_list: "1 2 4 8 16 36"
input_length: 1024
output_length: 1024
hardware:
gpus_per_node: 4
num_ctx_servers: 1
num_gen_servers: 4
environment:
container_mount: "<container_mount>"
container_image: "<container_image>"
model_path: "<model_path>"
worker_config:
gen:
tensor_parallel_size: 8
moe_expert_parallel_size: 8
max_batch_size: 32
max_num_tokens: 32
max_seq_len: 2251
# ... other gen worker configs
ctx:
tensor_parallel_size: 4
moe_expert_parallel_size: 4
max_batch_size: 4
max_num_tokens: 4608
max_seq_len: 2251
# ... other ctx worker configs
```
### Custom Metrics (Optional)
Most tests use default metrics. To customize:
```yaml
benchmark:
metrics:
log_file: "custom_benchmark.log"
extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
metric_names: ["CUSTOM_METRIC"]
```
## GPU Support
Currently supports **OCI GB200** only. The framework is designed to support additional GPU types in the future.
All configurations must specify:
```yaml
metadata:
supported_gpus: ["GB200"]
```
## Configuration Validation
The framework validates configurations before execution:
1. **gen_max_tokens**: Must equal `gen_max_batch_size * (mtp_size + 1)` when MTP is enabled
2. **streaming**: Must be `true`
3. **max_seq_len**: Both ctx and gen must be > (input_length + output_length)
## Running Tests
### Run all tests
```bash
poetry run pytest --disagg test_disagg.py -s -vv
```
### Run from test list
```bash
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt
```
### Run specific tests
```bash
# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf
# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy
# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"
```
## Batch Job Submission
The framework supports automatic batch job submission to maximize parallelism in SLURM cluster environments. Instead of submitting jobs one-by-one, it groups test cases into batches and submits entire batches when needed.
### Quick Start
**Default batch size (5 jobs per batch):**
```bash
# Run all tests with default batching
poetry run pytest --disagg test_disagg.py -s -vv
# Run with test list
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/all.txt
```
**Custom batch size:**
```bash
# Set batch size via command line
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=10
# Set batch size via environment variable
export DISAGG_BATCH_SIZE=20
poetry run pytest --disagg test_disagg.py -s -vv
# Submit all jobs at once (unlimited batch)
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=0
```
### How Batch Submission Works
```
Pytest Collection Phase:
- Collects all test cases (e.g., 100 tests)
- BatchManager splits them into batches (e.g., 20 batches of 5)
Pytest Execution Phase:
Test 0 runs:
-> Triggers submission of Batch 0 (jobs 0-4)
-> Waits for job 0 to complete
Test 1-4 run:
-> Batch 0 already submitted, directly wait for completion
Test 5 runs:
-> Triggers submission of Batch 1 (jobs 5-9)
-> Waits for job 5 to complete
... and so on
```
### Key Benefits
- **Parallel Execution**: All jobs in a batch run simultaneously on SLURM cluster
- **Reduced Wait Time**: Total time ≈ MAX(job time) instead of SUM(job times)
- **Automatic Management**: No need to manually split test lists
- **Lazy Loading**: Only submits batches when needed
### Configuration Options
**Priority**: Command line option > Environment variable > Default (5)
**Examples:**
```bash
# Small batch for quick testing
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=3 \
--disagg-test-list=./testlist/debug.txt
# Large batch for production
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=50 \
--disagg-test-list=./testlist/all.txt
# Submit all at once
poetry run pytest --disagg test_disagg.py -s -vv --disagg-batch-size=0
```
### Timeout Configuration
The default timeout for waiting for job completion is **10 hours (36000 seconds)**, which accounts for:
- SLURM queue wait time
- Job execution time
- Buffer for delays
### Performance Comparison
**Before (Sequential Submission):**
```
Case 1: submit + wait (1.5h) = 1.5h
Case 2: submit + wait (1.5h) = 1.5h
Case 3: submit + wait (1.5h) = 1.5h
...
Total: 50 × 1.5h = 75 hours
```
**After (Batch Submission, batch_size=50):**
```
Batch 0 (50 jobs): submitted in parallel
Case 1: wait (1.5h)
Case 2-50: wait (0s, already done)
Total: ~1.5 hours
```
**Speedup: 50x**
### Troubleshooting
**Check BatchManager initialization:**
```
======================================================================
Batch Manager Initialized
Batch size: 5 jobs per batch
======================================================================
Total test configs: 20
Total batches: 4
```
**Monitor batch submission:**
```
======================================================================
Submitting Batch 0
Range: [0:5] (5 jobs)
======================================================================
[ 1/5] Job 1234 <- test_config_id_1
[ 2/5] Job 1235 <- test_config_id_2
...
```
**If jobs timeout frequently:**
- Check SLURM queue status
- Consider reducing batch size to avoid resource contention
- Verify that timeout (36000s) is sufficient for your workload
## Test Naming Convention
Tests are automatically named using the format:
```
{test_type}_{category}_{config_filename}
```
Example: `disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL`
## File Naming Convention
Configuration files should follow this format:
```
{model}_{benchmark_type}_{config_details}.yaml
```
Examples:
- `deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml`
- `deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml`
Where:
- `1k1k`, `8k1k`: Input/output lengths (1024/1024, 8192/1024)
- `ctx1_gen1`: Context and generation server counts
- `dep32` or `tep8`: Data parallel (dep) or tensor parallel (tep) configuration
- `bs32`: Batch size
- `eplb0`: Expert parallel load balancing slots
- `mtp0`: Multi-token prediction layers
- `ccb-NIXL` or `ccb-UCX`: Communication backend
## Key Configuration Fields
### Metadata
- `model_name`: Model identifier
- `precision`: Model precision (fp4, fp8, etc.)
- `supported_gpus`: List of compatible GPU types
### Benchmark
- `mode`: Benchmark mode (e2e, gen_only, ctx_only)
- `streaming`: Enable streaming (must be true)
- `input_length`, `output_length`: Sequence lengths
- `concurrency_list`: Concurrency levels to test
### Worker Config
- `tensor_parallel_size`: Tensor parallelism degree
- `moe_expert_parallel_size`: MoE expert parallelism
- `max_batch_size`: Maximum batch size
- `max_num_tokens`: Maximum tokens per batch
- `max_seq_len`: Maximum sequence length
- `speculative_config`: Multi-token prediction settings (optional)
## Test Output
Test results are saved to:
- Performance metrics: `{OUTPUT_PATH}/perf_script_test_results.csv`
- Test logs: `{OUTPUT_PATH}/disagg_benchmark_{timestamp}.log`
## Environment Variables
- `GPU_TYPE`: Current GPU type (default: GB200)
- `OUTPUT_PATH`: Directory for test results and logs
- `WORK_DIR`: Working directory for benchmark execution
- `DISAGG_BATCH_SIZE`: Default batch size for job submission (default: 5)
- `DEBUG_MODE`: Enable debug mode (set to "1" to skip job submission)
- `DEBUG_JOB_ID`: Job ID to use in debug mode
## Debug Mode
For local testing without SLURM submission:
```bash
export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv
```
## Architecture
The framework consists of:
1. **ConfigLoader**: Scans and loads YAML configurations
2. **ConfigValidator**: Validates configuration correctness
3. **BatchManager**: Manages batch job submission for parallel execution
4. **JobManager**: Handles SLURM job submission and monitoring
5. **LogParser**: Extracts metrics from benchmark logs
6. **TestCaseTracker**: Tracks test execution timing
7. **ResultSaver**: Saves results to CSV
## Benefits
- **Simple**: YAML-based configuration, no code changes needed
- **Maintainable**: Each test is a separate file
- **Flexible**: Override defaults only when needed
- **Scalable**: Easy to add new tests and models
- **Reliable**: Automatic validation before execution
- **Traceable**: Comprehensive logging and result tracking
## Adding New Tests
1. Create a new YAML file in `test_configs/{test_type}/{category}/`
2. Configure the test parameters
3. Run pytest - the test will be automatically discovered
No code changes required!
---
For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.