mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-25 05:02:59 +08:00

History

dominicshanshan 6345074686 [None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>		2025-11-29 21:48:48 +08:00
..
.gitignore	Update (#2978 )	2025-03-23 16:39:35 +08:00
llm_digits_func.txt	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 )	2025-11-03 06:01:07 -08:00
llm_digits_perf.txt	[None][doc] add introduction doc on qa test (#6535 )	2025-08-05 17:02:17 +08:00
llm_function_core_sanity.txt	[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376 )	2025-11-26 16:38:25 +08:00
llm_function_core.txt	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 )	2025-11-29 21:48:48 +08:00
llm_function_gb20x.txt	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 )	2025-11-03 06:01:07 -08:00
llm_function_l20.txt	[TRTLLM-6928][fix] Refactor multimodal unittest (#8453 )	2025-11-03 06:01:07 -08:00
llm_function_multinode.txt	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 )	2025-11-29 21:48:48 +08:00
llm_function_nim.txt	[TRTLLM-9264][fix] Add accuracy/unit tests/doc for phi4mm (#9246 )	2025-11-26 11:12:35 +08:00
llm_function_rtx6k.txt	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 )	2025-11-29 21:48:48 +08:00
llm_function_stress.txt	[TRTLLM-8948][test] Add long bench case (#9165 )	2025-11-18 04:41:48 -08:00
llm_perf_cluster_nim.yml	[https://nvbugs/5564465 ][test] ensure deepseek_v3_lite isl + osl < max_seq_len (#8565 )	2025-10-28 15:25:52 +08:00
llm_perf_cluster.yml	[https://nvbugs/5564465 ][test] ensure deepseek_v3_lite isl + osl < max_seq_len (#8565 )	2025-10-28 15:25:52 +08:00
llm_perf_core.yml	[https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 (#9260 )	2025-11-20 00:58:42 -08:00
llm_perf_nim.yml	[https://nvbugs/5568991 ][test] Remove Phi-3 models (#9066 )	2025-11-12 03:16:36 -08:00
llm_perf_sanity.yml	[https://nvbugs/5667454 ][test] Fix Test Case as Chunked Attention not Supported on sm_120 (#9260 )	2025-11-20 00:58:42 -08:00
llm_triton_integration.txt	[https://nvbugs/5606136 ][ci] Remove tests for deprecating triton multimodal models. (#8926 )	2025-11-06 17:58:42 -08:00
llm_trt_integration_perf_sanity.yml	[None][test] correct test-db context for perf yaml file (#6686 )	2025-08-07 02:47:10 -04:00
llm_trt_integration_perf.yml	[None][test] correct test-db context for perf yaml file (#6686 )	2025-08-07 02:47:10 -04:00
README.md	[None][test] update nim and full test list (#7468 )	2025-09-04 09:06:01 -04:00

README.md

Description

This folder contains QA test definitions for TensorRT-LLM, which are executed on a daily/release schedule. These tests focus on end-to-end validation, accuracy verification, disaggregated testing, and performance benchmarking.

Test Categories

QA tests are organized into three main categories:

1. Functional Tests

Functional tests include E2E (end-to-end), accuracy, and disaggregated test cases:

E2E Tests: Complete workflow validation from model loading to inference output
Accuracy Tests: Model accuracy verification against reference implementations
Disaggregated Tests: Distributed deployment and multi-node scenario validation

2. Performance Tests

Performance tests focus on benchmarking and performance validation:

Baseline performance measurements
Performance regression detection
Throughput and latency benchmarking
Resource utilization analysis

3. Triton Backend Tests

Triton backend tests validate the integration with NVIDIA Triton Inference Server:

Backend functionality validation
Model serving capabilities
API compatibility testing
Integration performance testing

Dependencies

The following Python packages are required for running QA tests:

pip3 install -r ${TensorRT-LLM_PATH}/requirements-dev.txt

Dependency Details

mako: Template engine for test generation and configuration
oyaml: YAML parser with ordered dictionary support
rouge_score: ROUGE evaluation metrics for text generation quality assessment
lm_eval: Language model evaluation framework

Test Files

This directory contains various test configuration files:

Functional Test Lists

llm_function_core.txt - Primary test list for single node multi-GPU scenarios (all new test cases should be added here)
llm_function_core_sanity.txt - Subset of examples for quick torch flow validation
llm_function_nim.txt - NIM-specific functional test cases
llm_function_multinode.txt - Multi-node functional test cases
llm_function_gb20x.txt - GB20X release test cases
llm_function_rtx6k.txt - RTX 6000 series specific tests
llm_function_l20.txt - L20 specific tests, only contains single gpu cases

Performance Test Files

llm_perf_full.yml - Main performance test configuration
llm_perf_cluster.yml - Cluster-based performance tests
llm_perf_sanity.yml - Performance sanity checks
llm_perf_nim.yml - NIM-specific performance tests
llm_trt_integration_perf.yml - Integration performance tests
llm_trt_integration_perf_sanity.yml - Integration performance sanity checks

Triton Backend Tests

llm_triton_integration.txt - Triton backend integration tests

Release-Specific Tests

llm_digits_func.txt - Functional tests for DIGITS release
llm_digits_perf.txt - Performance tests for DIGITS release

Test Execution Schedule

QA tests are executed on a regular schedule:

Weekly: Automated regression testing
Release: Comprehensive validation before each release
- Full Cycle Testing: run all gpu with llm_function_core.txt + run NIM specific gpu with llm_function_nim.txt
- Sanity Cycle Testing: run all gpu with llm_function_core_sanity.txt
- NIM Cycle Testing: run all gpu with llm_function_core_sanity.txt + run NIM specific gpu with llm_function_nim.txt
On-demand: Manual execution for specific validation needs

Running Tests

Manual Execution

To run specific test categories:

# direct to defs folder
cd tests/integration/defs
# Run all fp8 functional test
pytest --no-header -vs --test-list=../test_lists/qa/llm_function_full.txt -k fp8
# Run a single test case
pytest -vs accuracy/test_cli_flow.py::TestLlama3_1_8B::test_auto_dtype

Automated Execution

QA tests are typically executed through CI/CD pipelines with appropriate test selection based on:

Release requirements
Hardware availability
Test priority and scope

Test Guidelines

Adding New Test Cases

Primary Location: For functional testing, new test cases should be added to llm_function_full.txt first
Categorization: Test cases should be categorized based on their scope and execution time
Validation: Ensure test cases are properly validated before adding to any test list