mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Venky d15ceae62e test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 ) * extend pyt nano tests perf coverage Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> * explicitly set maxnt for some cases This is because the test harness default to no prefill chunking, that means the isl specified is the true context. When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048. This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases. Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com> --------- Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>		2025-05-23 08:44:37 +08:00
..
defs	tests: update api change from decoder to sampler in test (#4479 )	2025-05-21 14:22:18 +08:00
perf_configs	Update (#2978 )	2025-03-23 16:39:35 +08:00
test_input_files	test: add random image test for llama-3.2-11b-vision (#3055 )	2025-03-26 15:38:16 +08:00
test_lists	test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407 )	2025-05-23 08:44:37 +08:00
README.md	infra: Update some test description which is out of date (#3437 )	2025-04-10 17:29:30 +08:00

README.md

TensorRT LLM test definitions

The following subfolder contains test definitions for Tensorrt LLM.

Directory structure

.
└── integration              # Root directory for integration tests
    ├── defs            #     Test definitions
    ├── perf_configs    #     Configs for perf tests
    └── test_lists      #     Test lists
        ├── test-db     #         Test-DB that is the test list convention adopted by CI
        ├── dev         #         Other test lists used by TRT LLM developers
        ├── qa          #         Test lists used by QA
        └── waives.txt  #         Test waive list

To run perf tests, you also need to first build the cpp benchmark by calling build_wheel.py with --benchmarks flag.

Run perf tests

All the perf test names are in the form of perf/test_perf.py::test_perf[...] where the ... part is the test parameters.

Below are some specific pytest options used for perf tests

# execute these in the tensorrt-llm source repo root dir.
# install dependencies, do not need to do it every time if already installed.
pip install -r requirements-dev.txt

# example 1: run a test case
# For example, if QA reports a perf bug for `perf/test_perf.py::test_perf[llama_7b-cppmanager-exe-plugin_ifb-float16-input_output_len:128,128,+512,32]`, then you can repro it by running:
cd LLM_ROOT/tests/integration/defs
echo "perf/test_perf.py::test_perf[llama_7b-cppmanager-exe-plugin_ifb-float16-input_output_len:128,128,+512,32]" > perf.txt
pytest --perf --test-list=perf.txt --output-dir=/workspace/test-log --perf-log-formats csv --perf-log-formats yaml

The captured perf metrics will be saved in /workspace/test-log/perf_scripts_test_results.csv or /workspace/test-log/perf_scripts_test_results.yaml depends on the option --perf-log-formats, and the test logs are saved in /workspace/test-log/result.xmk. Currently, we capture these perf metrics:

test_perf_metric_build_time: The engine building time in seconds.
test_perf_metric_build_peak_cpu_memory: The build-phase peak CPU mem usage in MB.
test_perf_metric_build_peak_gpu_memory: The build-phase peak GPU mem usage in MB.
test_perf_metric_inference_time: The inference latency in ms.
test_perf_metric_inference_peak_gpu_memory: The inference-phase peak GPU mem usage in GB.
test_perf_metric_context_gpu_memory: The context GPU mem usage in MB.

Common Issues and solutions

No package 'libffi' found Install libffi by sudo apt-get install libffi-dev and rerun.