mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

History

Anish Shanbhag dacc881993 [https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>		2026-01-12 10:55:07 -08:00
..
benchmark-serve.sh	[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985 )	2025-10-22 10:17:22 +08:00
config_database_b200_nvl.yaml	[https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 )	2026-01-12 10:55:07 -08:00
config_database_h200_sxm.yaml	[https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 )	2026-01-12 10:55:07 -08:00
deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 )	2025-12-31 21:44:59 +08:00
deepseek_r1_fp4_v2_blackwell.yaml	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 )	2025-12-31 21:44:59 +08:00
deepseek_r1_fp4_v2_grace_blackwell.yaml	[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel (#10489 )	2026-01-12 14:23:23 +08:00
deepseek_r1_fp8_blackwell.yaml	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 )	2025-12-31 21:44:59 +08:00
gpt_oss_120b_fp4_grace_blackwell.yaml	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 )	2025-12-31 21:44:59 +08:00
parse_benchmark_results.py	[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985 )	2025-10-22 10:17:22 +08:00
README.md	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 )	2025-12-31 21:44:59 +08:00
run_benchmark_serve.py	[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726 )	2025-12-16 05:16:32 -08:00

README.md

TensorRT-LLM Perf Sanity Test System

Performance sanity testing scripts for TensorRT-LLM with configuration-driven test cases supporting single-node, multi-node aggregated, and multi-node disaggregated architectures.

Overview

Run performance sanity benchmarks across multiple model configs
Support three deployment architectures: single-node, multi-node aggregated, and multi-node disaggregated
Manage test cases through YAML config files
Automated resource calculation and job submission via SLURM

Configuration File Types

There are two modes for perf sanity tests: aggregated (aggr) and disaggregated (disagg).

Aggregated Mode (aggr)

Config Location: tests/scripts/perf-sanity

File Naming: xxx.yaml where words are connected by _ (underscore), not - (hyphen).

File Examples:

deepseek_r1_fp4_v2_grace_blackwell.yaml - Single-node aggregated test
deepseek_r1_fp4_v2_2_nodes_grace_blackwell.yaml - Multi-node aggregated test

Use Cases:

Single-node: Performance tests on a single server with multiple GPUs
Multi-node: Model runs across multiple nodes with unified execution

Test Case Names:

perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}]
perf/test_perf_sanity.py::test_e2e[aggr_upload-{config yaml file base name}-{server_config_name}]

Without server config name: runs all server configs in the YAML file
With server config name: runs only the specified server config (the name field in server_configs)

Examples:

perf/test_perf_sanity.py::test_e2e[aggr_upload-deepseek_r1_fp4_v2_grace_blackwell]
perf/test_perf_sanity.py::test_e2e[aggr_upload-deepseek_r1_fp4_v2_grace_blackwell-r1_fp4_v2_dep4_mtp1_1k1k]
perf/test_perf_sanity.py::test_e2e[aggr_upload-deepseek_r1_fp4_v2_grace_blackwell-r1_fp4_v2_tep4_mtp3_1k1k]

Disaggregated Mode (disagg)

Config Location: tests/integration/defs/perf/disagg/test_configs/disagg/perf

File Naming: xxx.yaml (can contain - hyphen).

File Example: deepseek-r1-fp4_1k1k_ctx1_gen1_dep8_bs768_eplb0_mtp0_ccb-UCX.yaml

Use Case: Disaggregated architecture where model runs across multiple nodes with separate context (prefill) and generation (decode) servers.

Test Case Name:

perf/test_perf_sanity.py::test_e2e[disagg_upload-{config yaml file base name}]

Example:

perf/test_perf_sanity.py::test_e2e[disagg_upload-deepseek-r1-fp4_1k1k_ctx1_gen1_dep8_bs768_eplb0_mtp0_ccb-UCX]

Running Tests

Important: Do NOT add --perf flag when running pytest. Perf sanity tests are static test cases and do not use perf mode.

# Run all server configs in an aggregated test
pytest perf/test_perf_sanity.py::test_e2e[aggr_upload-deepseek_r1_fp4_v2_grace_blackwell]

# Run a specific server config in an aggregated test
pytest perf/test_perf_sanity.py::test_e2e[aggr_upload-deepseek_r1_fp4_v2_grace_blackwell-r1_fp4_v2_dep4_mtp1_1k1k]

# Run a specific disaggregated test
pytest perf/test_perf_sanity.py::test_e2e[disagg_upload-deepseek-r1-fp4_1k1k_ctx1_gen1_dep8_bs768_eplb0_mtp0_ccb-UCX]