mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 11:11:36 +08:00

History

yingguo-trt e70a55bd94 [None][feat] support multi_acc and Lyris GB200 test (#11024 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>		2026-01-28 06:01:48 -05:00
..
disagg	[None][feat] support multi_acc and Lyris GB200 test (#11024 )	2026-01-28 06:01:48 -05:00
wideep	[None][feat] support multi_acc and Lyris GB200 test (#11024 )	2026-01-28 06:01:48 -05:00
README.md	[TRTLLM-8936][test] Add disagg and wideep multi-node multi-gpu test cases (#9356 )	2025-11-26 10:34:49 +08:00

README.md

Test Configurations

This directory contains YAML configuration files for TensorRT-LLM disaggregated benchmark tests.

Directory Structure

test_configs/
├── disagg/                    # Disaggregated architecture
│   ├── perf/                  # Performance tests (24 configs)
│   └── accuracy/              # Accuracy tests
└── wideep/                    # Wide-deep architecture  
    ├── perf/                  # Performance tests (15 configs)
    └── accuracy/              # Accuracy tests (1 config)

File Naming Convention

Format: {model}_{benchmark_type}_ctx{N}_gen{M}_{parallel_config}_bs{B}_eplb{E}_mtp{T}_ccb-{backend}.yaml

Components:

model: Model name (e.g., deepseek-r1-fp4, Qwen3-235B-A22B-FP4)
benchmark_type: Input/output lengths (e.g., 1k1k = 1024/1024, 8k1k = 8192/1024)
ctx{N}_gen{M}: N context servers, M generation servers
Parallel configuration:
- dep{N}: Data parallel with attention_dp, TP size = N
- tep{N}: Tensor parallel only, TP size = N
bs{B}: Batch size
eplb{E}: Expert parallel load balancing slots (0 = disabled)
mtp{T}: Multi-token prediction layers (0 = disabled)
backend: Cache transceiver backend (NIXL, UCX, DEFAULT)

Examples:

deepseek-r1-fp4_1k1k_ctx1_gen4_tep8_bs32_eplb0_mtp0_ccb-NIXL.yaml
  → DeepSeek R1 FP4, 1k/1k, 1 ctx + 4 gen servers, TP=8, BS=32, no MTP, NIXL backend

deepseek-r1-fp4_8k1k_ctx1_gen3_tep8_bs16_eplb0_mtp3_ccb-UCX.yaml  
  → DeepSeek R1 FP4, 8k/1k, 1 ctx + 3 gen servers, TP=8, BS=16, MTP=3, UCX backend

Qwen3-235B-A22B-FP4_1k1k_ctx2_gen1_dep16_bs128_eplb0_mtp1_ccb-NIXL.yaml
  → Qwen3 235B FP4, 1k/1k, 2 ctx + 1 gen servers, DP+TP=16, BS=128, MTP=1, NIXL backend

Current Test Configurations

Models

DeepSeek R1 FP4: 16 configs (disagg) + 15 configs (wideep) = 31 configs
Qwen3-235B-A22B FP4: 8 configs (disagg) + 6 configs (wideep) = 14 configs

Benchmark Types

1k1k (1024/1024): 32 configs
8k1k (8192/1024): 8 configs

Backends

NIXL: 20 configs
UCX: 19 configs
DEFAULT: 1 config

Configuration Structure

Each YAML file contains:

metadata:
  model_name: "deepseek-r1-fp4"
  precision: "fp4"
  supported_gpus: ["GB200"]

benchmark:
  input_length: 1024
  output_length: 1024
  streaming: true
  concurrency_list: "..."

hardware:
  num_ctx_servers: 1
  num_gen_servers: 4

worker_config:
  gen:
    tensor_parallel_size: 8
    max_batch_size: 32
    max_seq_len: 2251
    speculative_config:
      num_nextn_predict_layers: 3  # MTP layers
  ctx:
    tensor_parallel_size: 4
    max_batch_size: 4
    max_seq_len: 2251

Key Configuration Constraints

Streaming: Must be true
max_seq_len: Both ctx and gen must be > (input_length + output_length)
gen_max_tokens: Must equal gen_max_batch_size * (mtp_size + 1) when MTP is enabled
supported_gpus: Currently all configs use ["GB200"]

Configuration Mapping Reference

Field	YAML Path
Input/Output lengths	`benchmark.input_length`, `benchmark.output_length`
Server counts	`hardware.num_ctx_servers`, `hardware.num_gen_servers`
Tensor parallel size	`worker_config.gen.tensor_parallel_size`
Batch size	`worker_config.gen.max_batch_size`
Attention DP	`worker_config.gen.enable_attention_dp`
MTP layers	`worker_config.gen.speculative_config.num_nextn_predict_layers`
Backend	`worker_config.gen.cache_transceiver_config.backend`
Concurrency levels	`benchmark.concurrency_list`

Adding New Configurations

Copy an existing config file as a template
Update the filename to match your configuration
Modify the YAML content:
- Update metadata section
- Adjust benchmark parameters
- Configure worker_config for ctx and gen
Ensure configuration constraints are met
Run tests - the new config will be automatically discovered

For detailed usage and test execution, see the main README.md in the parent directory.