mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

History

yufeiwu-nv 5d71f662c3 [https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041 ) Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>		2025-12-17 13:37:25 +08:00
..
defs	[https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041 )	2025-12-17 13:37:25 +08:00
perf_configs	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_input_files	[https://nvbugs/5394409 ][feat] Support Mistral Small 3.1 multimodal in Triton Backend (#6714 )	2025-08-21 18:08:38 +02:00
test_lists	[https://nvbugs/5698434 ][test] Add Qwen3-4B-Eagle3 One-model perf test (#10041 )	2025-12-17 13:37:25 +08:00
README.md	[None][feat] add waive by sm version (#8928 )	2025-11-05 19:20:43 -08:00

README.md

TensorRT LLM test definitions

The following subfolder contains test definitions for Tensorrt LLM.

Directory structure

.
└── integration              # Root directory for integration tests
    ├── defs            #     Test definitions
    ├── perf_configs    #     Configs for perf tests
    └── test_lists      #     Test lists
        ├── test-db     #         Test-DB that is the test list convention adopted by CI
        ├── dev         #         Other test lists used by TRT LLM developers
        ├── qa          #         Test lists used by QA
        └── waives.txt  #         Test waive list

Test Waives

The waives.txt file supports skipping tests based on:

Platform name: e.g., full:RTX/, full:DGX-A100-40GB/, full:GH200/
SM version: e.g., full:sm90/, full:sm89/, full:sm100/

SM version mapping:

sm89 = Ada (e.g., RTX 4090, L40S)
sm90 = Hopper (e.g., H100, H200)
sm100 = Blackwell (e.g., B100, B200)
sm103 = Blackwell-Ultra (e.g., B300, GB300)
To run perf tests, you also need to first build the cpp benchmark by calling build_wheel.py with --benchmarks flag.

Run perf tests

All the perf test names are in the form of perf/test_perf.py::test_perf[...] where the ... part is the test parameters.

Below are some specific pytest options used for perf tests

# execute these in the TensorRT LLM source repo root dir.
# install dependencies, do not need to do it every time if already installed.
pip install -r requirements-dev.txt

# example 1: run a test case
# For example, if QA reports a perf bug for `perf/test_perf.py::test_perf[llama_7b-cppmanager-exe-plugin_ifb-float16-input_output_len:128,128,+512,32]`, then you can repro it by running:
cd LLM_ROOT/tests/integration/defs
echo "perf/test_perf.py::test_perf[llama_7b-cppmanager-exe-plugin_ifb-float16-input_output_len:128,128,+512,32]" > perf.txt
pytest --perf --test-list=perf.txt --output-dir=/workspace/test-log --perf-log-formats csv --perf-log-formats yaml

The captured perf metrics will be saved in /workspace/test-log/perf_scripts_test_results.csv or /workspace/test-log/perf_scripts_test_results.yaml depends on the option --perf-log-formats, and the test logs are saved in /workspace/test-log/result.xmk. Currently, we capture these perf metrics:

test_perf_metric_build_time: The engine building time in seconds.
test_perf_metric_build_peak_cpu_memory: The build-phase peak CPU mem usage in MB.
test_perf_metric_build_peak_gpu_memory: The build-phase peak GPU mem usage in MB.
test_perf_metric_inference_time: The inference latency in ms.
test_perf_metric_inference_peak_gpu_memory: The inference-phase peak GPU mem usage in GB.
test_perf_metric_context_gpu_memory: The context GPU mem usage in MB.

Common Issues and solutions

No package 'libffi' found Install libffi by sudo apt-get install libffi-dev and rerun.