mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-22 19:52:38 +08:00

History

chenfeiz0326 d70aeddc7f [TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>		2025-12-26 22:50:53 +08:00
..
benchmark-serve.sh	[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985 )	2025-10-22 10:17:22 +08:00
config_database_b200_nvl.yaml	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
config_database_h200_sxm.yaml	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
deepseek_r1_fp4_v2_blackwell.yaml	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
deepseek_r1_fp4_v2_grace_blackwell.yaml	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
deepseek_r1_fp8_blackwell.yaml	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
gpt_oss_120b_fp4_blackwell.yaml	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
parse_benchmark_results.py	[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985 )	2025-10-22 10:17:22 +08:00
README.md	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 )	2025-12-26 22:50:53 +08:00
run_benchmark_serve.py	[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726 )	2025-12-16 05:16:32 -08:00

README.md

TensorRT-LLM Perf Sanity Test System

Performance sanity testing scripts for TensorRT-LLM with configuration-driven test cases supporting single-node, multi-node aggregated, and multi-node disaggregated architectures.

Overview

Run performance sanity benchmarks across multiple model configs
Support three deployment architectures: single-node, multi-node aggregated, and multi-node disaggregated
Manage test cases through YAML config files
Automated resource calculation and job submission via SLURM

Configuration File Types

There are three types of YAML config files for different deployment architectures. Aggregated config files are in tests/scripts/perf-sanity. Disaggregated config files are in tests/integration/defs/perf/disagg/test_configs/disagg/perf.

1. Single-Node Aggregated Test Configuration

File Example: deepseek_r1_fp4_v2_grace_blackwell.yaml

Use Case: Single-node performance tests on a single server with multiple GPUs.

2. Multi-Node Aggregated Test Configuration

File Example: deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml

Use Case: Multi-node aggregated architecture where model runs across multiple nodes with unified execution.

3. Multi-Node Disaggregated Test Configuration

File Example: deepseek-r1-fp4_1k1k_ctx1_gen1_dep8_bs768_eplb0_mtp0_ccb-UCX.yaml

Use Case: Disaggregated architecture where model runs across multiple nodes with separate context (prefill) and generation (decode) servers.