TensorRT-LLMs/tests/integration/defs/perf/disagg
ruodil 0f4ed90560
[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
2025-12-30 02:39:50 -05:00
..
envs [None][test] Add Kimi k2 WIDEEP perf and accuracy cases (#9686) 2025-12-08 01:25:07 -08:00
execution [None][chore] Fix GB300 support issues (#10196) 2025-12-23 10:42:41 +08:00
reporting [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
test_configs [TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225) 2025-12-30 02:39:50 -05:00
testlist [TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225) 2025-12-30 02:39:50 -05:00
utils [None][chore] Fix GB300 support issues (#10196) 2025-12-23 10:42:41 +08:00
compare_backends.py [https://nvbugs/5527655][test] Add test case for RCCA 5527655 (#9511) 2025-12-08 01:27:13 -08:00
conftest.py [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
poetry.lock [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
pyproject.toml [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
pytest.ini [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
README.md [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
simple_collect.py [TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356) 2025-11-26 10:34:49 +08:00
test_disagg.py [None][chore] Fix GB300 support issues (#10196) 2025-12-23 10:42:41 +08:00

TensorRT-LLM Disaggregated Benchmark Framework

A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.

Overview

This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.

Key Features

  • YAML Configuration: Each test has its own independent YAML configuration file
  • Automatic Test Discovery: Tests are automatically discovered from the config directory structure
  • Default Metrics: Built-in default metrics configuration for common test scenarios
  • GPU Filtering: Automatically filters tests based on hardware compatibility
  • Flexible Override: Override default configurations as needed for special cases
  • Test Categories: Support for both performance (perf) and accuracy tests
  • Multiple Test Types: Support for disagg (disaggregated) and wideep architectures

Directory Structure

test_configs/
├── disagg/                    # Disaggregated serving tests
│   ├── perf/                  # Performance tests
│   └── accuracy/              # Accuracy tests (optional)
└── wideep/                    # Wide-deep tests
    ├── perf/
    └── accuracy/

YAML Configuration

Minimal Configuration Example

metadata:
  model_name: "deepseek-r1-fp4"
  precision: "fp4"
  supported_gpus: ["GB200"]

slurm:
  partition: "<partition>"
  account: "<account>"
  job_time: "02:00:00"

benchmark:
  mode: "e2e"
  streaming: true
  concurrency_list: "1 2 4 8 16 36"
  input_length: 1024
  output_length: 1024

hardware:
  gpus_per_node: 4
  num_ctx_servers: 1
  num_gen_servers: 4

environment:
  container_mount: "<container_mount>"
  container_image: "<container_image>"
  model_path: "<model_path>"

worker_config:
  gen:
    tensor_parallel_size: 8
    moe_expert_parallel_size: 8
    max_batch_size: 32
    max_num_tokens: 32
    max_seq_len: 2251
    # ... other gen worker configs
  
  ctx:
    tensor_parallel_size: 4
    moe_expert_parallel_size: 4
    max_batch_size: 4
    max_num_tokens: 4608
    max_seq_len: 2251
    # ... other ctx worker configs

Custom Metrics (Optional)

Most tests use default metrics. To customize:

benchmark:
  metrics:
    log_file: "custom_benchmark.log"
    extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
    metric_names: ["CUSTOM_METRIC"]

GPU Support

Currently supports OCI GB200 only. The framework is designed to support additional GPU types in the future.

All configurations must specify:

metadata:
  supported_gpus: ["GB200"]

Configuration Validation

The framework validates configurations before execution:

  1. gen_max_tokens: Must equal gen_max_batch_size * (mtp_size + 1) when MTP is enabled
  2. streaming: Must be true
  3. max_seq_len: Both ctx and gen must be > (input_length + output_length)

Running Tests

Run all tests

poetry run pytest --disagg test_disagg.py -s -vv

Run from test list

poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt

Run specific tests

# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf

# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy

# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"

Test Naming Convention

Tests are automatically named using the format:

{test_type}_{category}_{config_filename}

Example: disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL

File Naming Convention

Configuration files should follow this format:

{model}_{benchmark_type}_{config_details}.yaml

Examples:

  • deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml
  • deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml

Where:

  • 1k1k, 8k1k: Input/output lengths (1024/1024, 8192/1024)
  • ctx1_gen1: Context and generation server counts
  • dep32 or tep8: Data parallel (dep) or tensor parallel (tep) configuration
  • bs32: Batch size
  • eplb0: Expert parallel load balancing slots
  • mtp0: Multi-token prediction layers
  • ccb-NIXL or ccb-UCX: Communication backend

Key Configuration Fields

Metadata

  • model_name: Model identifier
  • precision: Model precision (fp4, fp8, etc.)
  • supported_gpus: List of compatible GPU types

Benchmark

  • mode: Benchmark mode (e2e, gen_only, ctx_only)
  • streaming: Enable streaming (must be true)
  • input_length, output_length: Sequence lengths
  • concurrency_list: Concurrency levels to test

Worker Config

  • tensor_parallel_size: Tensor parallelism degree
  • moe_expert_parallel_size: MoE expert parallelism
  • max_batch_size: Maximum batch size
  • max_num_tokens: Maximum tokens per batch
  • max_seq_len: Maximum sequence length
  • speculative_config: Multi-token prediction settings (optional)

Test Output

Test results are saved to:

  • Performance metrics: {OUTPUT_PATH}/perf_script_test_results.csv
  • Test logs: {OUTPUT_PATH}/disagg_benchmark_{timestamp}.log

Environment Variables

  • GPU_TYPE: Current GPU type (default: GB200)
  • OUTPUT_PATH: Directory for test results and logs
  • WORK_DIR: Working directory for benchmark execution
  • DEBUG_MODE: Enable debug mode (set to "1" to skip job submission)
  • DEBUG_JOB_ID: Job ID to use in debug mode

Debug Mode

For local testing without SLURM submission:

export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv

Architecture

The framework consists of:

  1. ConfigLoader: Scans and loads YAML configurations
  2. ConfigValidator: Validates configuration correctness
  3. JobManager: Handles SLURM job submission and monitoring
  4. LogParser: Extracts metrics from benchmark logs
  5. TestCaseTracker: Tracks test execution timing
  6. ResultSaver: Saves results to CSV

Benefits

  • Simple: YAML-based configuration, no code changes needed
  • Maintainable: Each test is a separate file
  • Flexible: Override defaults only when needed
  • Scalable: Easy to add new tests and models
  • Reliable: Automatic validation before execution
  • Traceable: Comprehensive logging and result tracking

Adding New Tests

  1. Create a new YAML file in test_configs/{test_type}/{category}/
  2. Configure the test parameters
  3. Run pytest - the test will be automatically discovered

No code changes required!


For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.