Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com> |
||
|---|---|---|
| .. | ||
| .gitignore | ||
| llm_digits_func.txt | ||
| llm_digits_perf.txt | ||
| llm_function_core_sanity.txt | ||
| llm_function_core.txt | ||
| llm_function_gb20x.txt | ||
| llm_function_l20.txt | ||
| llm_function_multinode.txt | ||
| llm_function_nim.txt | ||
| llm_function_rtx6k.txt | ||
| llm_function_stress.txt | ||
| llm_perf_cluster_nim.yml | ||
| llm_perf_cluster.yml | ||
| llm_perf_core.yml | ||
| llm_perf_nim.yml | ||
| llm_perf_sanity.yml | ||
| llm_triton_integration.txt | ||
| llm_trt_integration_perf_sanity.yml | ||
| llm_trt_integration_perf.yml | ||
| README.md | ||
Description
This folder contains QA test definitions for TensorRT-LLM, which are executed on a daily/release schedule. These tests focus on end-to-end validation, accuracy verification, disaggregated testing, and performance benchmarking.
Test Categories
QA tests are organized into three main categories:
1. Functional Tests
Functional tests include E2E (end-to-end), accuracy, and disaggregated test cases:
- E2E Tests: Complete workflow validation from model loading to inference output
- Accuracy Tests: Model accuracy verification against reference implementations
- Disaggregated Tests: Distributed deployment and multi-node scenario validation
2. Performance Tests
Performance tests focus on benchmarking and performance validation:
- Baseline performance measurements
- Performance regression detection
- Throughput and latency benchmarking
- Resource utilization analysis
3. Triton Backend Tests
Triton backend tests validate the integration with NVIDIA Triton Inference Server:
- Backend functionality validation
- Model serving capabilities
- API compatibility testing
- Integration performance testing
Dependencies
The following Python packages are required for running QA tests:
pip3 install -r ${TensorRT-LLM_PATH}/requirements-dev.txt
Dependency Details
- mako: Template engine for test generation and configuration
- oyaml: YAML parser with ordered dictionary support
- rouge_score: ROUGE evaluation metrics for text generation quality assessment
- lm_eval: Language model evaluation framework
Test Files
This directory contains various test configuration files:
Functional Test Lists
llm_function_core.txt- Primary test list for single node multi-GPU scenarios (all new test cases should be added here)llm_function_core_sanity.txt- Subset of examples for quick torch flow validationllm_function_nim.txt- NIM-specific functional test casesllm_function_multinode.txt- Multi-node functional test casesllm_function_gb20x.txt- GB20X release test casesllm_function_rtx6k.txt- RTX 6000 series specific testsllm_function_l20.txt- L20 specific tests, only contains single gpu cases
Performance Test Files
llm_perf_full.yml- Main performance test configurationllm_perf_cluster.yml- Cluster-based performance testsllm_perf_sanity.yml- Performance sanity checksllm_perf_nim.yml- NIM-specific performance testsllm_trt_integration_perf.yml- Integration performance testsllm_trt_integration_perf_sanity.yml- Integration performance sanity checks
Triton Backend Tests
llm_triton_integration.txt- Triton backend integration tests
Release-Specific Tests
llm_digits_func.txt- Functional tests for DIGITS releasellm_digits_perf.txt- Performance tests for DIGITS release
Test Execution Schedule
QA tests are executed on a regular schedule:
- Weekly: Automated regression testing
- Release: Comprehensive validation before each release
- Full Cycle Testing: run all gpu with llm_function_core.txt + run NIM specific gpu with llm_function_nim.txt
- Sanity Cycle Testing: run all gpu with llm_function_core_sanity.txt
- NIM Cycle Testing: run all gpu with llm_function_core_sanity.txt + run NIM specific gpu with llm_function_nim.txt
- On-demand: Manual execution for specific validation needs
Running Tests
Manual Execution
To run specific test categories:
# direct to defs folder
cd tests/integration/defs
# Run all fp8 functional test
pytest --no-header -vs --test-list=../test_lists/qa/llm_function_full.txt -k fp8
# Run a single test case
pytest -vs accuracy/test_cli_flow.py::TestLlama3_1_8B::test_auto_dtype
Automated Execution
QA tests are typically executed through CI/CD pipelines with appropriate test selection based on:
- Release requirements
- Hardware availability
- Test priority and scope
Test Guidelines
Adding New Test Cases
- Primary Location: For functional testing, new test cases should be added to
llm_function_full.txtfirst - Categorization: Test cases should be categorized based on their scope and execution time
- Validation: Ensure test cases are properly validated before adding to any test list