mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-01 00:31:24 +08:00

History

ruodil 0f4ed90560 [TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>		2025-12-30 02:39:50 -05:00
..
envs	[None][test] Add Kimi k2 WIDEEP perf and accuracy cases (#9686 )	2025-12-08 01:25:07 -08:00
execution	[None][chore] Fix GB300 support issues (#10196 )	2025-12-23 10:42:41 +08:00
reporting	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
test_configs	[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 )	2025-12-30 02:39:50 -05:00
testlist	[TRTLLM-9965][test] add long-context disagg test for GB300/GB200 and remove config_index in yaml (#10225 )	2025-12-30 02:39:50 -05:00
utils	[None][chore] Fix GB300 support issues (#10196 )	2025-12-23 10:42:41 +08:00
compare_backends.py	[https://nvbugs/5527655 ][test] Add test case for RCCA 5527655 (#9511 )	2025-12-08 01:27:13 -08:00
conftest.py	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
poetry.lock	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
pyproject.toml	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
pytest.ini	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
README.md	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
simple_collect.py	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00
test_disagg.py	[None][chore] Fix GB300 support issues (#10196 )	2025-12-23 10:42:41 +08:00

README.md

TensorRT-LLM Disaggregated Benchmark Framework

A YAML-based testing framework for TensorRT-LLM disaggregated serving performance and accuracy benchmarks.

Overview

This framework provides a simple, maintainable approach to benchmark testing using YAML configuration files. Each test configuration is defined in a separate YAML file, with automatic test discovery and execution through pytest.

Key Features

YAML Configuration: Each test has its own independent YAML configuration file
Automatic Test Discovery: Tests are automatically discovered from the config directory structure
Default Metrics: Built-in default metrics configuration for common test scenarios
GPU Filtering: Automatically filters tests based on hardware compatibility
Flexible Override: Override default configurations as needed for special cases
Test Categories: Support for both performance (perf) and accuracy tests
Multiple Test Types: Support for disagg (disaggregated) and wideep architectures

Directory Structure

test_configs/
├── disagg/                    # Disaggregated serving tests
│   ├── perf/                  # Performance tests
│   └── accuracy/              # Accuracy tests (optional)
└── wideep/                    # Wide-deep tests
    ├── perf/
    └── accuracy/

YAML Configuration

Minimal Configuration Example

metadata:
  model_name: "deepseek-r1-fp4"
  precision: "fp4"
  supported_gpus: ["GB200"]

slurm:
  partition: "<partition>"
  account: "<account>"
  job_time: "02:00:00"

benchmark:
  mode: "e2e"
  streaming: true
  concurrency_list: "1 2 4 8 16 36"
  input_length: 1024
  output_length: 1024

hardware:
  gpus_per_node: 4
  num_ctx_servers: 1
  num_gen_servers: 4

environment:
  container_mount: "<container_mount>"
  container_image: "<container_image>"
  model_path: "<model_path>"

worker_config:
  gen:
    tensor_parallel_size: 8
    moe_expert_parallel_size: 8
    max_batch_size: 32
    max_num_tokens: 32
    max_seq_len: 2251
    # ... other gen worker configs
  
  ctx:
    tensor_parallel_size: 4
    moe_expert_parallel_size: 4
    max_batch_size: 4
    max_num_tokens: 4608
    max_seq_len: 2251
    # ... other ctx worker configs

Custom Metrics (Optional)

Most tests use default metrics. To customize:

benchmark:
  metrics:
    log_file: "custom_benchmark.log"
    extractor_pattern: "Custom Pattern:\s+([0-9.]+)"
    metric_names: ["CUSTOM_METRIC"]

GPU Support

Currently supports OCI GB200 only. The framework is designed to support additional GPU types in the future.

All configurations must specify:

metadata:
  supported_gpus: ["GB200"]

Configuration Validation

The framework validates configurations before execution:

gen_max_tokens: Must equal gen_max_batch_size * (mtp_size + 1) when MTP is enabled
streaming: Must be true
max_seq_len: Both ctx and gen must be > (input_length + output_length)

Running Tests

Run all tests

poetry run pytest --disagg test_disagg.py -s -vv

Run from test list

poetry run pytest --disagg test_disagg.py -s -vv --disagg-test-list=./testlist/disagg.txt

Run specific tests

# Run only performance tests
poetry run pytest --disagg test_disagg.py -s -vv -m perf

# Run only accuracy tests
poetry run pytest --disagg test_disagg.py -s -vv -m accuracy

# Run specific test by ID
poetry run pytest --disagg test_disagg.py -s -vv -k "deepseek-r1-fp4_1k1k"

Test Naming Convention

Tests are automatically named using the format:

{test_type}_{category}_{config_filename}

Example: disagg_perf_deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL

File Naming Convention

Configuration files should follow this format:

{model}_{benchmark_type}_{config_details}.yaml

Examples:

deepseek-r1-fp4_1k1k_ctx1_gen1_dep32_bs32_eplb0_mtp0_ccb-NIXL.yaml
deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs32_eplb0_mtp0_ccb-UCX.yaml

Where:

1k1k, 8k1k: Input/output lengths (1024/1024, 8192/1024)
ctx1_gen1: Context and generation server counts
dep32 or tep8: Data parallel (dep) or tensor parallel (tep) configuration
bs32: Batch size
eplb0: Expert parallel load balancing slots
mtp0: Multi-token prediction layers
ccb-NIXL or ccb-UCX: Communication backend

Key Configuration Fields

Metadata

model_name: Model identifier
precision: Model precision (fp4, fp8, etc.)
supported_gpus: List of compatible GPU types

Benchmark

mode: Benchmark mode (e2e, gen_only, ctx_only)
streaming: Enable streaming (must be true)
input_length, output_length: Sequence lengths
concurrency_list: Concurrency levels to test

Worker Config

tensor_parallel_size: Tensor parallelism degree
moe_expert_parallel_size: MoE expert parallelism
max_batch_size: Maximum batch size
max_num_tokens: Maximum tokens per batch
max_seq_len: Maximum sequence length
speculative_config: Multi-token prediction settings (optional)

Test Output

Test results are saved to:

Performance metrics: {OUTPUT_PATH}/perf_script_test_results.csv
Test logs: {OUTPUT_PATH}/disagg_benchmark_{timestamp}.log

Environment Variables

GPU_TYPE: Current GPU type (default: GB200)
OUTPUT_PATH: Directory for test results and logs
WORK_DIR: Working directory for benchmark execution
DEBUG_MODE: Enable debug mode (set to "1" to skip job submission)
DEBUG_JOB_ID: Job ID to use in debug mode

Debug Mode

For local testing without SLURM submission:

export DEBUG_MODE=1
export DEBUG_JOB_ID=12345
poetry run pytest --disagg test_disagg.py -s -vv

Architecture

The framework consists of:

ConfigLoader: Scans and loads YAML configurations
ConfigValidator: Validates configuration correctness
JobManager: Handles SLURM job submission and monitoring
LogParser: Extracts metrics from benchmark logs
TestCaseTracker: Tracks test execution timing
ResultSaver: Saves results to CSV

Benefits

Simple: YAML-based configuration, no code changes needed
Maintainable: Each test is a separate file
Flexible: Override defaults only when needed
Scalable: Easy to add new tests and models
Reliable: Automatic validation before execution
Traceable: Comprehensive logging and result tracking

Adding New Tests

Create a new YAML file in test_configs/{test_type}/{category}/
Configure the test parameters
Run pytest - the test will be automatically discovered

No code changes required!

For detailed configuration options and advanced usage, refer to the inline comments in the YAML configuration files.