TensorRT-LLMs/cpp/tests
Robin Kobus b7a38feb14
chore: Clean up cpp runtime (#3537)
* add space in test output

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* perf: reduce executor lock scope

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* refactor: Move TokenRangeRetentionConfig implementation to cpp file

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: Improve finished steps handling for external draft tokens

- Fixed a bug where the whole finished steps tensor was being zeroes instead of the slices.
- Replaced the creation of a temporary tensor for finished steps with a direct slice from the input tensor, improving efficiency and readability.
- Updated the tensor management logic to streamline the process of setting zero values for finished steps during batch processing.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* chore: Clean up includes

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-04-15 16:06:14 +08:00
..
batch_manager feat: Allow individual gatherContext for each additional output (#3374) 2025-04-12 17:00:36 +08:00
executor chore: Clean up cpp runtime (#3537) 2025-04-15 16:06:14 +08:00
kernels Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
layers Update TensorRT-LLM (#2755) 2025-02-11 03:01:00 +00:00
resources Update TensorRT-LLM (#2873) 2025-03-11 21:13:42 +08:00
runtime refactor: batch slot management in decoder classes (#3300) 2025-04-13 05:05:13 +08:00
unit_tests feat: Add FP8 support for SM 120 (#3248) 2025-04-14 16:05:41 -07:00
utils feat: Allow individual gatherContext for each additional output (#3374) 2025-04-12 17:00:36 +08:00
CMakeLists.txt chore: Stabilize ABI boundary for internal kernel library (#3117) 2025-04-11 15:07:50 +08:00
README.md chore: Clean up cpp runtime (#3505) 2025-04-14 18:00:03 +08:00

C++ Tests

This document explains how to build and run the C++ tests, and the included resources.

All-in-one script

The Pytest script test_cpp.py builds TRT-LLM, builds engines, and generates expected outputs and executes the C++ tests all in one go. To get an overview of the tests and their parameterization, call:

pytest tests/integration/defs/test_cpp.py --collect-only

All tests take the number of the CUDA architecture of the GPU you wish to use as a parameter e.g. 90 for Hopper.

It is possible to choose unit tests or a single model for end-to-end tests. Example calls could look like this:

export LLM_MODELS_ROOT="/path/to/model_cache"

pytest tests/integration/defs/test_cpp.py::test_unit_tests[90]

pytest tests/integration/defs/test_cpp.py::test_model[llama-90]

pytest tests/integration/defs/test_cpp.py::test_benchmarks[gpt-90]

pytest tests/integration/defs/test_cpp.py::test_multi_gpu[90]

Manual steps

Compile

From the top-level directory call:

CPP_BUILD_DIR=cpp/build
python3 scripts/build_wheel.py -a "80-real;86-real" --build_dir ${CPP_BUILD_DIR}
pip install -r requirements-dev.txt
pip install build/tensorrt_llm*.whl
cd $CPP_BUILD_DIR && make -j$(nproc) google-tests

Single tests can be executed from CPP_BUILD_DIR/tests, e.g.

./$CPP_BUILD_DIR/tests/allocatorTest

End-to-end tests

gptSessionTest,trtGptModelRealDecoderTest and executorTest require pre-built TensorRT engines, which are loaded in the tests. They also require data files which are stored in cpp/tests/resources/data.

Build engines

Scripts are provided that download the GPT2 and GPT-J models from Huggingface and convert them to TensorRT engines. The weights and built engines are stored under cpp/tests/resources/models. To build the engines from the top-level directory:

PYTHONPATH=examples/gpt:$PYTHONPATH python3 cpp/tests/resources/scripts/build_gpt_engines.py
PYTHONPATH=examples/gptj:$PYTHONPATH python3 cpp/tests/resources/scripts/build_gptj_engines.py
PYTHONPATH=examples/llama:$PYTHONPATH python3 cpp/tests/resources/scripts/build_llama_engines.py
PYTHONPATH=examples/chatglm:$PYTHONPATH python3 cpp/tests/resources/scripts/build_chatglm_engines.py
PYTHONPATH=examples/medusa:$PYTHONPATH python3 cpp/tests/resources/scripts/build_medusa_engines.py
PYTHONPATH=examples/eagle:$PYTHONPATH python3 cpp/tests/resources/scripts/build_eagle_engines.py
PYTHONPATH=examples/redrafter:$PYTHONPATH python3 cpp/tests/resources/scripts/build_redrafter_engines.py

It is possible to build engines with tensor and pipeline parallelism for LLaMA using 4 GPUs.

PYTHONPATH=examples/llama python3 cpp/tests/resources/scripts/build_llama_engines.py --only_multi_gpu

If there is an issue finding model_spec.so in engine building, manually build model_spec.so by

make -C cpp/build/ modelSpec

Generate expected output

End-to-end tests read inputs and expected outputs from Numpy files located at cpp/tests/resources/data. The expected outputs can be generated using scripts which employ the Python runtime to run the built engines:

PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_gpt_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_gptj_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_llama_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_chatglm_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_medusa_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_eagle_output.py
PYTHONPATH=examples:$PYTHONPATH python3 cpp/tests/resources/scripts/generate_expected_redrafter_output.py

Generate data with tensor and pipeline parallelism

It is possible to generate tensor and pipeline parallelism data for LLaMA using 4 GPUs. To generate results from the top-level directory:

PYTHONPATH=examples mpirun -n 4 python3 cpp/tests/resources/scripts/generate_expected_llama_output.py --only_multi_gpu

Run test

After building the engines and generating the expected output execute the tests

./$CPP_BUILD_DIR/tests/gptSessionTest

Run all tests with ctest

To run all tests and produce an xml report, call

./$CPP_BUILD_DIR/ctest --output-on-failure --output-junit "cpp-test-report.xml"