Commit Graph

1005 Commits

Author SHA1 Message Date
Venky
b4e598da27
[cherry-pick] test(perf): Add Llama-3_1-Nemotron-Ultra-253B-v1 perf tests (cpp) (#4446) (#4590)
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2025-05-28 14:17:24 +08:00
Venky
42e622a3b9
[cherry-pick] test(perf): Add remaining Phi-4-mini-instruct perf tests (#4443) (#4589)
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-28 14:17:18 +08:00
brb-nv
fc3c2f7f7c
fix: Mistral Small vision encoder with BS>1 (#4713)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-05-28 12:49:28 +08:00
HuiGao-NV
1bfc7d4c29
fix: [nvbug5300494] Use runtime total gpu memory to calculate kv cache memory and log more memory information (#4660)
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-05-28 10:00:19 +08:00
Yuxian Qiu
87b50a5736
fix: [nvbugs/5289912][nvbugs/5232406] use thread pool for multi-thread weight loading in fused moe. (#4699)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-28 08:13:06 +08:00
Yiqing Yan
6df8620577
[TRTLLM-5326] - Fix test coverage report generation (#4691)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2025-05-27 18:24:29 +08:00
Ivy Zhang
fbe48df361
tests: waive and unwaive QA test cases (#4644)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-27 15:19:45 +08:00
Yan Chunwei
10119412ef
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4529)
fix[nvbug/5286515]: trtllm-llmapi-launch on single node single gpu (#4428)
2025-05-27 15:19:04 +08:00
Yanchao Lu
cbb6a264be
[Test] - Correct waive the Slurm test stage (#4680)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-05-27 13:34:49 +08:00
Martin Marciniszyn Mehringer
8eafe83c37
Update the description for NGC docker images (#4671)
Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-27 10:57:39 +08:00
Michal Guzek
24153c068e
[TRTLLM-4932] Add QA accuracy tests for NIM-prioritized models (#4242)
* Add tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add tests v2

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add fixes

Signed-off-by: moraxu <mguzek@nvidia.com>

* Skip fp8 test for Ultra

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add tests for Phi

Signed-off-by: moraxu <mguzek@nvidia.com>

* Skip tests for Phi

Signed-off-by: moraxu <mguzek@nvidia.com>

* Skip tests for Phi - fix

Signed-off-by: moraxu <mguzek@nvidia.com>

* Skip tests for Phi - comment out acc refs

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add more test granularity

Signed-off-by: moraxu <mguzek@nvidia.com>

* Fix examples_test_list.txt

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update test list file

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update yaml files

Signed-off-by: moraxu <mguzek@nvidia.com>

* Address review comments

Signed-off-by: moraxu <mguzek@nvidia.com>

* Remove MMLU tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add remaining models

Signed-off-by: moraxu <mguzek@nvidia.com>

---------

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-24 19:17:21 +08:00
Jinyang Yuan
f9a9a1af2e
[fix] Fix Llama4 allgather error due to None tensor (#4511)
* [fix] Fix Llama4 allgather error due to None tensor

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

* Refactor modifications

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

* Minor modification

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

* Minor fix

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>

---------

Signed-off-by: Jinyang Yuan <154768711+jinyangyuan-nvidia@users.noreply.github.com>
2025-05-24 19:12:12 +08:00
Iman Tabrizian
ad4d947b24
Add missing rcca folder (#4591)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-24 03:28:10 +08:00
Michal Guzek
2a2d7ebf2e
[fix] Incorrect mocker argument for a CLI accuracy test in Llama-3.3-70B-Instruct (#4604)
Fix mocker argument

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:18:37 -07:00
Michal Guzek
d2e6af2fe4
[TRTLLM-4932] Add CLI accuracy tests for Llama-3_3-Nemotron-Super-49B-v1 and LLM API FP8 variant (#4375)
* Add CLI TestNemotronSuper acc tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update mmlu.yaml

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update yaml files

Signed-off-by: moraxu <mguzek@nvidia.com>

* Skip FP8 test in CLI

Signed-off-by: moraxu <mguzek@nvidia.com>

* Address reviews

Signed-off-by: moraxu <mguzek@nvidia.com>

* Address review comments

Signed-off-by: moraxu <mguzek@nvidia.com>

---------

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-23 12:17:23 -07:00
Faraz
53008d3ee8
[TR[TLLM-4618][feat] Add remaining NVFP4 Nemotron Super 49B test on RTX6000 Pro (SM120) (#4548)
added nvfp4 nemotron for qa testing on RTX 6000

Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-05-23 10:42:32 -07:00
Simeng Liu
630b7907a0
[CI] Waive known errors with test TestDeepSeekV3Lite::test_fp8_block_scales_4gpus (#4627)
Signed-off-by: Simeng Liu <simengl@nvidia.com>
2025-05-23 10:33:44 -07:00
Robin Kobus
7c1565a2b6
[nvbugs/5274894] fix: Sort requests for functional correctness and performance (#4608)
* Revert "[nvbugs/5274894] fix: Moving finished context requests to generation (#4576)"

This reverts commit d39bcb6b40.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* fix: Sort requests for functional correctness and performance

- Moved sorting related logic to a dedicated function for better clarity and maintainability.
- Enhanced sorting logic to separate finished context requests from ongoing ones before sorting by Lora task ID.
- Updated function documentation to reflect the sorting behavior and its purpose.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-23 15:08:54 +02:00
stnie
21af6f77dc
ci: waive testcase [NVBUG 5297821] (#4616)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
2025-05-23 20:54:42 +08:00
Barry Kang
9e15c035a7
Update internal cutlass kernels commit id (#4619)
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-23 20:07:41 +08:00
Barry Kang
26793e3569
[https://nvbugs/5289907][fix] Restore per-channel pre-quant (#4545)
* Restore per-channel pre-quant

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

* Update TRT test script

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

* Fix pre-commit

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>

---------

Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-05-23 19:46:53 +08:00
Yukun He
d7701ea6d8
[5180961] chore: Unwaive test for Qwen model. (#4524)
* Unwaive test for Qwen model.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

* update.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>

---------

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 13:28:08 +08:00
ruodil
2ce14357ff
test: fix for perf sanity test and skip fp8 deepseek blackwell cases (#4598)
fix for sanity test and skip fp8 deepseek blackwell cases

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-23 11:13:14 +08:00
Venky
d15ceae62e
test(perf): Extend the Llama-Nemotron-Nano-8B perf-integration-tests (pyt) (#4407)
* extend pyt nano tests perf coverage

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

* explicitly set maxnt for some cases

This is because the test harness default to no prefill chunking, that means the isl specified is the true context.
When explicitly unspecified in the test harness, the `maxnt` passed down to `trtllm-bench` is 2048.
This means trtllm-bench gets conflicting inputs when isl>2048 but maxnt=2048; hence overriding maxnt to be consistent with isl for such cases.

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-23 08:44:37 +08:00
Yukun He
dd79631b77
[5234029][5226211] chore: Unwaive multimodal tests for Qwen model. (#4519)
Unwaive multimodal tests for Qwen models.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-23 08:04:56 +08:00
Robin Kobus
d39bcb6b40
[nvbugs/5274894] fix: Moving finished context requests to generation (#4576)
fix: Moving finished context requests to generation

- Unfinished chunked context requests appear at end of context requests vector.
- Replaced std::find_if with std::partition to find the correct position to move finished context requests to generation.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2025-05-22 17:49:40 +02:00
ruodil
3d083b69be
test: waive hanging cases for perf test (#4563)
waive hanging cases

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-22 21:09:12 +08:00
Yukun He
21ada0a961
[5141290][5273694][5260696] fix: Fix mrope argument missing issue in the summary tasks for Qwen model. (#4432)
Fixed the mrope argument missing issue in the summary tasks for Qwen models.
And re-enabled the fixed tests.

Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-05-22 17:45:59 +08:00
Martin Marciniszyn Mehringer
1ad82a0b15
fix: [TRTLLM-325]WAR against security vulnerabilities in Python packages (#4539)
* fix: [TRTLLM-325]WAR against security vulnerabilities in Python packages

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Update docker images

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

---------

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-22 08:33:20 +08:00
Iman Tabrizian
d4cccdc48b
Add tritonrelease container (#4544)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-05-21 10:48:30 -07:00
ruodil
ce6a32997b
test: add failed case in waive list and fix some test script issue for perf test (#4528)
add failed case in waive list and fix some test script issue

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
2025-05-21 16:36:32 +08:00
Robin Kobus
cc490de92c
docs: Add KV Cache Management documentation (#3908)
* docs: Add KV Cache Management documentation

* Introduced a new document detailing the hierarchy and event system for KV cache management, including definitions for Pool, Block, and Page.
* Updated the index.rst to include a reference to the new kv-cache-management.md file.

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* Update docs/source/advanced/kv-cache-management.md

Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* Update KV Cache Pool Management

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Addcross-file links

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Clarify tokens_per_block

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

* docs: Clarify acronyms

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>

---------

Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
Co-authored-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
2025-05-21 08:39:28 +02:00
Ivy Zhang
e977c75300
tests: update api change from decoder to sampler in test (#4479)
update

Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-05-21 14:22:18 +08:00
Daniel Cámpora
cc3f8e6431
fix: Fix trtllm sampler beam width bug (#4507)
* Fix TRTLLMSampler.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

* Added type hint.

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>

---------

Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>
2025-05-21 14:21:39 +08:00
Yuxian Qiu
ff0f37bcf8
chore: Deprecate autopp. (#4471)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-21 13:50:11 +08:00
Kaiyu Xie
7ae6cd73b5
chore: Remove unused script (#4485)
* Remove unused script

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

* Remove unused json file

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>

---------

Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-05-21 13:46:39 +08:00
Yuxian Qiu
f8bd372c59
Cherry pick https://github.com/NVIDIA/TensorRT-LLM/pull/4447 (#4517)
fix: skip weights defined in create_weights for pp. (#4447)

Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-05-21 13:30:21 +08:00
QI JUN
74928b55e9
Cherry pick #4508 (#4512)
Chore: waive torch compile test cases of deepseek v3 lite (#4508)

waive torch compile test cases of deepseek v3 lite

Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-05-21 11:25:36 +08:00
Yuan Tong
4ea29b3072
fix: cleanup process tree for disaggregated test (#4116)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2025-05-21 11:01:14 +08:00
Shi Xiaowei
6547f8b932
fix: replace the image links in the blog (#4490)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2025-05-20 22:39:58 +08:00
Zhanrui Sun
0e7abba952
chore: bump version to 0.20.0 (#4469)
Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-05-20 15:27:29 +08:00
Martin Marciniszyn Mehringer
3485347584
doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and document (#4400)
* doc: [TRTLLM-325]Integrate the NGC image in Makefile automation and documentation

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* WAR against https://github.com/advisories/GHSA-vqfr-h8mv-ghfj

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Fix default assignment for CUDA architectures in SBSA build

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Push new docker images

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

* Handle constraints.txt in setup.py

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>

---------

Signed-off-by: Martin Marciniszyn Mehringer <11665257+MartinMarciniszyn@users.noreply.github.com>
2025-05-19 23:45:01 -07:00
Zhanrui Sun
f2c0565577
chore: bump version to 0.21.0rc0 (#4465)
* chore: bump version to 0.21.0rc0

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>

* Update CODEOWNERS

Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>

---------

Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>
Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>
2025-05-20 12:19:50 +08:00
Lucas Liebenwein
de409e8468
[AutoDeploy] HF factory improvements (#4371)
* [AutoDeploy] HF factory improvements

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

* improve monkey-patches and add unit tests

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>

---------

Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-05-19 20:13:43 -07:00
ruodil
b5edf13b33
test: update test filter in perf test yml file to select cases by gpu name and add cases for RTX 6000 pro (#4282)
* add cases for rtx_pro_6000 and update test filter

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

* amend a typo in model llama_v3.1_405b_instruct fp4 and add more cases for rtx pro 6000 and waive_list

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>

---------

Signed-off-by: Ruodi <200874449+ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-05-20 10:58:05 +08:00
Michal Guzek
0a342a42f7
[TRTLLM-4932] Add CLI accuracy tests for Llama-3.3-70B-Instruct and LLM API BF16 variant (#4362)
* Add CLI TestLlama3_3_70BInstruct acc tests

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add tests to qa lists

Signed-off-by: moraxu <mguzek@nvidia.com>

* Add comment

Signed-off-by: moraxu <mguzek@nvidia.com>

* Fix test names

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update yaml files

Signed-off-by: moraxu <mguzek@nvidia.com>

* Update cli file

Signed-off-by: moraxu <mguzek@nvidia.com>

---------

Signed-off-by: moraxu <mguzek@nvidia.com>
2025-05-20 09:48:14 +08:00
xinhe-nv
402385588d
test: [CI] Add failed cases into waives.txt (#4429)
* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive id

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

* update waive list

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>

---------

Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-05-20 09:43:55 +08:00
kanghui0204
6f3922f318
feat: Low Precision Allreduce for PCIe based GPU (#4344)
This PR adds a customized allreduce to TensorRT-LLM. The new allreduce is used for communication on PCIe-based GPUs via low-precision quantization, which can accelerate the PCIe allreduce process.

Signed-off-by: Hui Kang <hkang@nvidia.com>
Co-authored-by: Hui Kang <hkang@nvidia.com>
2025-05-20 06:53:46 +08:00
Yuxian Qiu
c8e062bfd3
fix: [nvbugs/5287097] Align PP layer distribution between pytorch and TRT flow. (#4399)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
Co-authored-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-05-19 14:25:36 -07:00
Venky
bb02d86b54
test(perf): Add some Llama-3_3-Nemotron-Super-49B-v1 integration-perf-tests (TRT flow, trtllm-bench) (#4128)
* changes to run llama-v3.3-nemotron-super-49b

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* yapf

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* address review comments pt 1

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>

* re-add cpp super tests 

Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>

---------

Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2025-05-19 12:00:48 -07:00