mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Daniel Cámpora df19430629 chore: Mass Integration 0.19 (#4255 ) * fix: Fix/fused moe 0.19 (#3799) * fix bug of stream init Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix bug Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> * fix: Add pre-download of checkpoint before benchmark. (#3772) * Add pre-download of checkpoint before benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Add missing remote code flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move from_pretrained to throughput benchmark. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Move download and use snapshot_download. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Removed trusted flag. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * Fix benchmark command in iteration log test. Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> --------- Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5241495][fix] CUDA Graph padding with overlap scheduler (#3839) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fuse Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * TRTLLM-4875 feat: Add version switcher to doc (#3871) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * waive a test (#3897) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * docs:fix https://nvbugs/5244616 by removing new invalid links. (#3939) Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> * fix: remote mpi session abort (#3884) * fix remote mpi session Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * fix Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> --------- Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> * skip fp8 gemm for pre-hopper (#3931) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> * [https://nvbugspro.nvidia.com/bug/5247148][fix] Attention DP with overlap scheduler (#3975) * fix Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * update multigpu list Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * fix namings Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> --------- Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> * Doc: Fix H200 DeepSeek R1 perf doc (#4006) * fix doc Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * update perf number Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> --------- Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> * Fix the perf regression caused by insufficient cache warmup. (#4042) Force tuning up to 8192 sequence length for NVFP4 linear op. Also, make this runtime-selectable with UB enabled. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * doc: Update 0.19.0 release notes (#3976) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * Optimize the AutoTuner cache access code to reduce host code overhead. (#4060) The NVFP4 Linear op is very sensitive to the host overhead. This PR introduces customizable `find_nearest_profile` and `get_cache_key_specifc`, which allow users to override the default method for generating the cache key. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> * Update switcher (#4098) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * doc: update release notes (#4108) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> * docs:update 0.19 doc. (#4120) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * docs:add torch flow supported model list. (#4129) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> * doc: Release V0.19 Perf Overview Update (#4166) Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> * Fix readme of autodeploy. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Update tensorrt_llm/_torch/pyexecutor/llm_request.py Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> * Revert mgmn worker node. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> * Change to disable_overlap_scheduler. Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> --------- Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Frank Di Natale <3429989+FrankD412@users.noreply.github.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: zpatel <22306219+zbpatel@users.noreply.github.com> Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com> Signed-off-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com> Co-authored-by: bhsueh_NV <11360707+byshiue@users.noreply.github.com> Co-authored-by: Frank <3429989+FrankD412@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: nv-guomingz <37257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: jiahanc <173873397+jiahanc@users.noreply.github.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com> Co-authored-by: Zac Patel <22306219+zbpatel@users.noreply.github.com>		2025-05-16 10:53:25 +02:00
..
integration	chore: Mass Integration 0.19 (#4255 )	2025-05-16 10:53:25 +02:00
microbenchmarks	chore: Cleanup deprecated APIs from LLM-API (part 1/2) (#3732 )	2025-05-07 13:20:25 +08:00
unittest	chore: Mass Integration 0.19 (#4255 )	2025-05-16 10:53:25 +02:00
README.md	Refactor: Restructure C++ tests for better modularisation of non-shared code (#4027 )	2025-05-09 19:16:51 +01:00

README.md

How to run TRT-LLM tests

1. Unit test (Python)

All the tests contained in the unittest directory folder are considered as "unit test" in this doc, these tests can use the python standard unittests and pytest. Since pytest are compatible with the unittest framework, we use pytest to launch these in the CI.

Unit test should be small, fast, and test only for specific function.

If you need to run them locally, the only dependencies are requirements-dev.txt.

# in tensorrt-llm source repo root dir
# use editable install, such that your local changes will be used immedietely in the tests w/o another install
# see https://setuptools.pypa.io/en/latest/userguide/development_mode.html
pip install -e ./

# the pytest and required plugins used are listed in the requirements-dev.txt
pip install -r requirements-dev.txt

cd tests/
## There are multiple ways to tell pytest to launch a subset of the targeted test cases

# example 1: runs all the tests under this directory, ignores the integration. WARNING: this can takes a very long time
pytest ./

# example 2: run a single test file
pytest ./test_builder.py

# example 3: run a test in a subfolder
pytest ./functional

# example 4: run a test with a substr
pytest -k test_basic_builder_flow

2. Integration test (Python)

All the integration tests are launched by pytest. The integration tests are currently all located tests/integration/defs.

You can read the pytest official doc for details, https://docs.pytest.org/en/stable/

Prepare model files (Non-NVIDIA developers)

Many integration tests rely on real model data. To correctly run the integration test, you must place all needed models in a directory and set environment variable LLM_MODELS_ROOT to it.

The subdirectory hierarchy of each model can be found in the codebase. For example, bert_example_root in integration/defs/conftest.py.

Examples to run integration test locally.

export LLM_MODELS_ROOT=/path-to-models

# in root dir
pip install -r requirements-dev.txt
cd tests/integration/defs

# example 1: run a case
pytest "accuracy/test_llm_api_pytorch.py::TestLlama3_1_8B::test_auto_dtype"

# example 2: run a test list
pytest --rootdir . --test-list=<a txt file contains on test case per line>

# example 3: list all the cases.
pytest --co -q

# example 4: run all the tests which contains this sub string
pytest -k test_llm_gpt2_medium_bad_words_1gpu

# example 5: run all tests which match this regexp
pytest -R ".*test_llm_gpt2_medium_bad_words_1gpu.*non.*py.*"

# example 6: list all the cases contains a sub string
pytest -k llmapi --co -q

You can set the output directory for logs/runtime data using the --output-dir flag. For more options, refer to pytest --help, paying attention to Custom options added for TRT-LLM.

Common issues:

trtllm-build: not found

Many of the test cases use trtllm-build command to build engines. If you meet the error of trtllm-build: not found, you should add the trtllm-build path into your PATH env before launchig pytest. Normally if you install trtllm in the $HOME/.local or use pip install -e ./ to install trtllm in-place, the trtllm-build command should be located in $HOME/.local/bin.

Thus you should do export PATH=$HOME/.local/bin:$PATH before running the pytest
The LLM_MODELS_ROOT is not set correctly
```
    AssertionError: ...llm-models/gpt2-medium does not exist, and fail_if_path_is_invalid is True, please check the cache directory
    assert False

  conftest.py:149: AssertionError
```
If you see above failures when running pytest locally, its likely that you didn't set the LLM_MODELS_ROOT env correctly. The default value is a NVIDIA internal path that is used in CI environment.

When you finish setup the model directory, remember to mount it in the docker container.

3. C++ runtime test

TRT-LLM C++ runtime tests are using google-test framework, and Pytest is used to run sets of these tests.

The C++ runtime relies on TRT-LLM python frontend to generate engines as test data, so there are scripts to generate the engines in the C++ test resources directory. Pytest calls these scripts from fixtures prior to launching the test cases.

Details on usage of the resources scripts can be found in the C++ Test document.

4. Performance regression test

For performance regression testing in QA and CI, see the performance test guide.

How to add test to CI

1. How does the CI work

Due to CI hardware resource limitation, and some cases only run on specific GPUs, the test cases are managed based on GPU type.

In directory integration/test_lists/test-db, each yml file corresponds to a GPU type.

In file jenkins/L0_Test.groovy, the variable turtleConfigs maps yml files to CI stages.

Currently the yml files are manually maintained, which requires developer to update them when new test cases are added.

How to choose GPU type

The CI resource of each GPU type is different. Usually you should choose the cheapest GPU that fulfills test requirements. In most cases, an integration test case should only run on one GPU type, unless it's very important or has different behaviours on different GPUs.

The priority is A10 > A30 > L40s > A100 > H100 > B200.

2. Add an integration test

Integrations tests usually run entire workflow, containing checkpoint converting, engine building and evaluating, to check functional and accuracy.

Integration tests are stored in integration/defs. In particular, please see integration/defs/accuracy for more detailed guidance to add accuracy tests.

Once a new integration test case is added, the yml files must be updated to contain the newly added case. Otherwise, the CI will not be able to collect and run this case.

3. Add a unit test

A unit test are used to test a standalone feature or building block, and only runs partial workflow.

For legacy and case management reason, the CI doesn't run unit tests directly. It uses a bridge to map multiple unit test cases into one integration test case, and manages these bridged cases. The bridge is implemented in integration/defs/test_unittests.py and pytest_generate_tests function in tests/integration/defs/conftest.py.

In integration/test_lists/test-db, cases with prefix unittest/ are treated as unit test bridges. Each of them generates an instance of test_unittests_v2 which executes a pytest subprocess in tests/unittest directory. The entire line will be passed as commandline arguments of pytest subprocess.

For example, unittest/trt/attention/test_gpt_attention.py -k "partition0" is equivalent to cd tests; pytest unittest/trt/attention/test_gpt_attention.py -k "partition0".

New unit tests can be added to CI as follows:

Determine the commandline to run desired cases. In working directory tests, the command usually looks like one of them:

pytest unittest/_torch/my_new_folder # run all cases in a directory
pytest unittest/_torch/my_new_file.py # run all cases in a file
pytest unittest/an_existing_file.py -k "some_keyword or another_keyword" # run some cases in a file, filtered by keywords
pytest unittest/an_existing_file.py -m "part0 and gpu2" # run some cases in a file, filtered by pytest mark

Check existing bridge cases and make sure your cases are not covered by an existing one. For example, you may want to add pytest unittest/an_existing_file.py -k "some_keyword or another_keyword", but there is already pytest unittest/an_existing_file.py -k "not thrid_keyword" which covers your filter.
Choose a suitable GPU and add a line of your cases. For example, adding unittest/an_existing_file.py -k "some_keyword or another_keyword" to tests/integration/test_lists/test-db/l0_a10.yml.

4. Run a CI stage locally

Each yml file in integration/test_lists/test-db corresponds to a CI stage. You can run a stage locally, e.g. l0_a10.yml, as follows.

Open l0_a10.yml, it should look like:

version: 0.0.1
l0_a10:
- condition:
    ranges:
      system_gpu_count:
        gte: 1
        lte: 1
    wildcards:
      gpu:
      - '*a10*'
      linux_distribution_name: ubuntu*
  tests:
  # ------------- PyTorch tests ---------------
  - disaggregated/test_disaggregated.py::test_disaggregated_single_gpu_with_mpirun[TinyLlama-1.1B-Chat-v1.0]
  - disaggregated/test_disaggregated.py::test_disaggregated_cuda_graph[TinyLlama-1.1B-Chat-v1.0]
  - disaggregated/test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0]
  - disaggregated/test_disaggregated.py::test_disaggregated_overlap[TinyLlama-1.1B-Chat-v1.0]
  # ------------- CPP tests ---------------
  - cpp/test_e2e.py::test_model[medusa-86]
  - cpp/test_e2e.py::test_model[redrafter-86]
  - cpp/test_e2e.py::test_model[mamba-86]
  - cpp/test_e2e.py::test_model[recurrentgemma-86]
  - cpp/test_e2e.py::test_model[eagle-86]

Copy all items in tests field to a text file, for example, a10_list.txt. Don't forget to remove extra characters like comments and the dash marks.

disaggregated/test_disaggregated.py::test_disaggregated_single_gpu_with_mpirun[TinyLlama-1.1B-Chat-v1.0]
disaggregated/test_disaggregated.py::test_disaggregated_cuda_graph[TinyLlama-1.1B-Chat-v1.0]
disaggregated/test_disaggregated.py::test_disaggregated_mixed[TinyLlama-1.1B-Chat-v1.0]
disaggregated/test_disaggregated.py::test_disaggregated_overlap[TinyLlama-1.1B-Chat-v1.0]
cpp/test_e2e.py::test_model[medusa-86]
cpp/test_e2e.py::test_model[redrafter-86]
cpp/test_e2e.py::test_model[mamba-86]
cpp/test_e2e.py::test_model[recurrentgemma-86]
cpp/test_e2e.py::test_model[eagle-86]

Invoke pytest with TRT-LLM custom option --test-list:

cd tests/integration/defs
pytest . --test-list="a10_list.txt" --output-dir=/tmp/llm_integration_test