[None][doc] add introduction doc on qa test (#6535)

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
This commit is contained in:
Ivy Zhang 2025-08-05 17:02:17 +08:00 committed by GitHub
parent d101a6cebc
commit 08ed9d7305
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
16 changed files with 108 additions and 0 deletions

View File

@ -0,0 +1,108 @@
# Description
This folder contains QA test definitions for TensorRT-LLM, which are executed on a daily/release schedule. These tests focus on end-to-end validation, accuracy verification, disaggregated testing, and performance benchmarking.
## Test Categories
QA tests are organized into three main categories:
### 1. Functional Tests
Functional tests include E2E (end-to-end), accuracy, and disaggregated test cases:
- **E2E Tests**: Complete workflow validation from model loading to inference output
- **Accuracy Tests**: Model accuracy verification against reference implementations
- **Disaggregated Tests**: Distributed deployment and multi-node scenario validation
### 2. Performance Tests
Performance tests focus on benchmarking and performance validation:
- Baseline performance measurements
- Performance regression detection
- Throughput and latency benchmarking
- Resource utilization analysis
### 3. Triton Backend Tests
Triton backend tests validate the integration with NVIDIA Triton Inference Server:
- Backend functionality validation
- Model serving capabilities
- API compatibility testing
- Integration performance testing
## Dependencies
The following Python packages are required for running QA tests:
```bash
pip install mako oyaml rouge_score lm_eval
```
### Dependency Details
- **mako**: Template engine for test generation and configuration
- **oyaml**: YAML parser with ordered dictionary support
- **rouge_score**: ROUGE evaluation metrics for text generation quality assessment
- **lm_eval**: Language model evaluation framework
## Test Files
This directory contains various test configuration files:
### Functional Test Lists
- `llm_function_full.txt` - Primary test list for single node multi-GPU scenarios (all new test cases should be added here)
- `llm_function_sanity.txt` - Subset of examples for quick torch flow validation
- `llm_function_nim.txt` - NIM-specific functional test cases
- `llm_function_multinode.txt` - Multi-node functional test cases
- `llm_function_gb20x.txt` - GB20X release test cases
- `llm_function_rtx6kd.txt` - RTX 6000 Ada specific tests
### Performance Test Files
- `llm_perf_full.yml` - Main performance test configuration
- `llm_perf_cluster.yml` - Cluster-based performance tests
- `llm_perf_sanity.yml` - Performance sanity checks
- `llm_perf_nim.yml` - NIM-specific performance tests
- `llm_trt_integration_perf.yml` - Integration performance tests
- `llm_trt_integration_perf_sanity.yml` - Integration performance sanity checks
### Triton Backend Tests
- `llm_triton_integration.txt` - Triton backend integration tests
### Release-Specific Tests
- `llm_digits_func.txt` - Functional tests for DIGITS release
- `llm_digits_perf.txt` - Performance tests for DIGITS release
## Test Execution Schedule
QA tests are executed on a regular schedule:
- **Weekly**: Automated regression testing
- **Release**: Comprehensive validation before each release
- **On-demand**: Manual execution for specific validation needs
## Running Tests
### Manual Execution
To run specific test categories:
```bash
# direct to defs folder
cd tests/integration/defs
# Run all fp8 functional test
pytest --no-header -vs --test-list=../test_lists/qa/llm_function_full.txt -k fp8
# Run a single test case
pytest -vs accuracy/test_cli_flow.py::TestLlama3_1_8B::test_auto_dtype
```
### Automated Execution
QA tests are typically executed through CI/CD pipelines with appropriate test selection based on:
- Release requirements
- Hardware availability
- Test priority and scope
## Test Guidelines
### Adding New Test Cases
- **Primary Location**: For functional testing, new test cases should be added to `llm_function_full.txt` first
- **Categorization**: Test cases should be categorized based on their scope and execution time
- **Validation**: Ensure test cases are properly validated before adding to any test list