Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| envs | ||
| execution | ||
| reporting | ||
| test_configs | ||
| testlist | ||
| utils | ||
| compare_backends.py | ||
| conftest.py | ||
| poetry.lock | ||
| pyproject.toml | ||
| pytest.ini | ||
| README.md | ||
| simple_collect.py | ||
| test_disagg.py | ||
TensorRT-LLM Disaggregated Benchmark Framework
A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.
Overview
This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.
Key Features
- YAML Configuration: Each test has its own independent YAML configuration file
- Automatic Test Discovery: Tests are automatically discovered from the config directory structure
- Default Metrics: Built-in default metrics configuration for common test scenarios
- GPU Filtering: Automatically filters tests based on hardware compatibility
- Flexible Override: Override default configurations as needed for special cases
- Test Categories: Support for both performance (perf) and accuracy tests
- Multiple Test Types: Support for disagg (disaggregated) and wideep architectures
Directory Structure
test_configs/
├── disagg/ # Disaggregated serving tests
│ ├── perf/ # Performance tests
│ └── accuracy/ # Accuracy tests (optional)
└── wideep/ # Wide-deep tests
├── perf/
└── accuracy/
YAML Configuration
Minimal Configuration Example
metadata:
model_name: "deepseek-r1-fp4"
precision: "fp4"
supported_gpus: ["GB200"]
slurm:
partition: "<partition>"
account: "<account>"
job_time: "02:00:00"
benchmark:
mode: "e2e"
streaming: true
concurrency_list: "1 2 4 8 16 36"
input_length: 1024
output_length: 1024
hardware:
gpus_per_node: 4
num_ctx_servers: 1
num_gen_servers: 4
environment:
container_mount: "<container_mount>"
container_image: "<container_image>"
model_path: "<model_path>"
worker_config:
gen:
tensor_parallel_size: 8
moe_expert_parallel_size: 8
max_batch_size: 32
max_num_tokens: 32
max_seq_len: 2251
# ... other gen worker configs
ctx:
tensor_parallel_size: 4
moe_expert_parallel_size: 4
max_batch_size: 4
max_num_tokens: 4608
max_seq_len: 2251
# ... other ctx worker configs
Custom Metrics (Optional)
Most tests use default metrics. To customize:
benchmark:
metrics:
log_file: "custom_benchmark.log"
extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
metric_names: ["CUSTOM_METRIC"]
GPU Support
Currently supports OCI GB200 only. The framework is designed to support additional GPU types in the future.
All configurations must specify:
metadata:
supported_gpus: ["GB200"]
Configuration Validation
The framework validates configurations before execution:
- gen_max_tokens: Must equal
gen_max_batch_size * (mtp_size + 1)when MTP is enabled - streaming: Must be
true - max_seq_len: Both ctx and gen must be > (input_length + output_length)
Running Tests
Run all tests
poetry run pytest --disagg test_disagg.py -s -vv
Run from test list
poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt
Run specific tests
# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf
# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy
# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"
Test Naming Convention
Tests are automatically named using the format:
{test_type}_{category}_{config_filename}
Example: disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL
File Naming Convention
Configuration files should follow this format:
{model}_{benchmark_type}_{config_details}.yaml
Examples:
deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yamldeepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml
Where:
1k1k,8k1k: Input/output lengths (1024/1024, 8192/1024)ctx1_gen1: Context and generation server countsdep32ortep8: Data parallel (dep) or tensor parallel (tep) configurationbs32: Batch sizeeplb0: Expert parallel load balancing slotsmtp0: Multi-token prediction layersccb-NIXLorccb-UCX: Communication backend
Key Configuration Fields
Metadata
model_name: Model identifierprecision: Model precision (fp4, fp8, etc.)supported_gpus: List of compatible GPU types
Benchmark
mode: Benchmark mode (e2e, gen_only, ctx_only)streaming: Enable streaming (must be true)input_length,output_length: Sequence lengthsconcurrency_list: Concurrency levels to test
Worker Config
tensor_parallel_size: Tensor parallelism degreemoe_expert_parallel_size: MoE expert parallelismmax_batch_size: Maximum batch sizemax_num_tokens: Maximum tokens per batchmax_seq_len: Maximum sequence lengthspeculative_config: Multi-token prediction settings (optional)
Test Output
Test results are saved to:
- Performance metrics:
{OUTPUT_PATH}/perf_script_test_results.csv - Test logs:
{OUTPUT_PATH}/disagg_benchmark_{timestamp}.log
Environment Variables
GPU_TYPE: Current GPU type (default: GB200)OUTPUT_PATH: Directory for test results and logsWORK_DIR: Working directory for benchmark executionDEBUG_MODE: Enable debug mode (set to "1" to skip job submission)DEBUG_JOB_ID: Job ID to use in debug mode
Debug Mode
For local testing without SLURM submission:
export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv
Architecture
The framework consists of:
- ConfigLoader: Scans and loads YAML configurations
- ConfigValidator: Validates configuration correctness
- JobManager: Handles SLURM job submission and monitoring
- LogParser: Extracts metrics from benchmark logs
- TestCaseTracker: Tracks test execution timing
- ResultSaver: Saves results to CSV
Benefits
- Simple: YAML-based configuration, no code changes needed
- Maintainable: Each test is a separate file
- Flexible: Override defaults only when needed
- Scalable: Easy to add new tests and models
- Reliable: Automatic validation before execution
- Traceable: Comprehensive logging and result tracking
Adding New Tests
- Create a new YAML file in
test_configs/{test_type}/{category}/ - Configure the test parameters
- Run pytest - the test will be automatically discovered
No code changes required!
For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.