Simeng Liu
99995846b3
[ https://nvbugs/5470782 ][chore] Remove the skip statement in 1.0 rele… ( #7573 )
...
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
peaceh-nv
541b7fda89
[ https://nvbugs/5503423 ][waive] Waive Llama3.1-70B-FP8 test on RTX PRO 6000 ( #7603 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yan Chunwei
afca2fcbe0
[ https://nvbugs/5351244 ][fix] test_mpi_session ( #7501 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Yuxian Qiu
2d46dda6a7
[ https://nvbugs/5448754 ][fix] Download HF model for all nodes. ( #6824 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Lizhi Zhou
293d9fb612
[ https://nvbugs/5448767 ][fix] disable kv cache reuse for disagg pp>1 tests ( #7354 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00
Stefan Niebler
8aead224fb
[ https://nvbugs/5513423 ][fix] Correctly respect min_tokens in PyTorch Workflow ( #7808 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Co-authored-by: Daniel Cámpora <961215+dcampora@users.noreply.github.com>
2025-09-21 22:15:18 -07:00
peaceh-nv
9dc7316b7f
[ https://nvbugs/5512556 ][unwaive] Unwaive DeepSeek PP tests ( #7828 )
...
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
2025-09-22 10:26:30 +08:00
dongxuy04
9eb8084ca9
[TRTLLM-7008][fix] cherrypick to main Add automatic shared memory delete if already exist ( #7727 )
...
Signed-off-by: Dongxu Yang <78518666+dongxuy04@users.noreply.github.com>
2025-09-21 11:01:51 -07:00
Ziyi Xiong
897c4dd23b
[ https://nvbugs/5517404 ][fix] Use the correct cuda graph for dynamic spec dec ( #7728 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-21 08:20:48 +08:00
Yan Chunwei
4509d97780
[TRTLLM-8188][chore] refactor GenerationExecutorWorker with WorkerBase for better code reusing ( #7840 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-20 06:24:22 -07:00
Chang Liu
2e317a7db6
[ https://nvbugs/5520490 ][fix] Fix intermittent test failures by avoiding external web data pulls ( #7879 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-19 17:24:13 -07:00
Mike Iovine
8030b540ac
[ https://nvbugs/5522462 ][fix] Fix FP8 scout illegal memory access ( #7845 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2025-09-19 10:30:37 -04:00
pcastonguay
fbe325ce57
[ https://nvbugs/5471108 ][chore] Unwaiving disagg acc test ( #7686 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-09-19 08:56:09 -04:00
Yuxian Qiu
7d28acdbf0
[ https://nvbugs/5522332 ][fix] Pin numpy version for Gemma. (cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/7783 ) ( #7797 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 18:50:40 +08:00
Liao Lanyu
18095a7cb8
[ https://nvbugs/5503440 ][fix] Fix potential hang due to wrong type of ZMQ socket and protocol for worker_init_status_queue ( #7646 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
2025-09-19 18:13:33 +08:00
xinhe-nv
efb763402f
[None][chore] Add failed cases into waives.txt ( #7841 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-19 17:59:47 +08:00
Ivy Zhang
0ac51487f4
[None][chore] remove cli cases for rtx6k ( #7833 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:33:59 +08:00
Ivy Zhang
6b33bcced2
[None][test] Add accuracy benchmark in stress test ( #7561 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-19 16:09:46 +08:00
dominicshanshan
451475e0dc
[None][ci] Waive llama3 auto dtype test bug in https://nvbugs/5527956 . ( #7853 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-19 14:54:59 +08:00
Emma Qiao
ea079fa530
[None][infra] Waive failed tests in post-merge ( #7859 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-19 14:16:12 +08:00
ruodil
c5453103d6
[None][test] add deepseek r1/v3 model with chunked prefill cases ( #7124 )
...
Signed-off-by: ruodil <200874449+ruodil@users.noreply.github.com>
2025-09-19 11:12:53 +08:00
fredricz-20070104
fc4e6d3702
[TRTLLM-7183][test] Feature fix model issue for disagg serving ( #7785 )
...
Signed-off-by: FredricZ-2007 <226039983+fredricz-20070104@users.noreply.github.com>
2025-09-19 10:12:55 +08:00
Yuxian Qiu
d6ebcf7c4a
[TRTLLM-6994][feat] FP8 Context MLA integration (Cherry-pick https://github.com/NVIDIA/TensorRT-LLM/pull/6059 from release/1.1.0rc2) ( #7610 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-09-19 09:40:49 +08:00
Ziyi Xiong
420f0fbcf5
[ https://nvbugs/5522851 ][fix] Correct the logic to update kv_lens_cuda ( #7790 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-19 08:11:29 +08:00
QI JUN
7646da2d85
[None][ci] set TORCHINDUCTOR_COMPILE_THREADS correctly ( #7800 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-19 07:19:50 +08:00
sunnyqgg
80dd8fe197
[TRTLLM-6746][feat] Enable two-model spec dec for MTP Eagle ( #7001 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-09-18 12:05:36 -04:00
Li Min
d921fc3352
[TRTLLM-6898][feat] Add swapab, tileN64, cga sync support for cute dsl nvfp4 gemm ( #7764 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-18 21:20:04 +08:00
xinhe-nv
d3a907131a
[ https://nvbugs/5519462 ][fix] Add failed cases into waives.txt ( #7817 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 20:01:06 +08:00
Wanli Jiang
fe104dc20d
[TRTLLM-7918][feat] Support kvcache reuse and chunk prefill for phi4mm ( #7723 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 17:37:16 +08:00
xinhe-nv
d909f80379
[TRTLLM-7250][fix] Add failed cases into waives.txt ( #7807 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 17:13:07 +08:00
Wanli Jiang
a7ca0fff54
[TRTLLM-6577][feat] Support nano_v2_vlm in pytorch backend ( #7207 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2025-09-18 16:26:20 +08:00
dongfengy
2ae08bd1b8
[ https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test ( #7819 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-18 16:01:53 +08:00
xinhe-nv
236f71ea05
[None][chore] Add failed cases into waives.txt ( #7801 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 14:48:16 +08:00
Leslie Fang
870cfcf9a0
[None][chore] Remove executor config in create_py_executor ( #7599 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-18 14:24:58 +08:00
Li Min
14e455da3e
[None][fix] Fix CI issue for dsl pkg install ( #7784 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-09-18 13:58:20 +08:00
Ivy Zhang
26d50eb539
[TRTLLM-8070][test] add generation logits case for llama3 ( #7759 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-18 13:33:16 +08:00
Yukun He
cd80e0a7f1
[None][fix] Make tile_tokens_dim calculation just in time before kernel launching. ( #7529 )
...
tile_tokens_dim directly depends on the num_token, which is a dynamic shape during tuning and inference. When AutoTuner prepares dummy tensors with different num_tokens, it does not update the value of tile_tokens_dim automatically. Therefore, the value stored in the AutoTuner cache is misaligned, which will introduce a lot of cache misses during inference, which hurts perf a lot.
To avoid this issue, we move the calculation of tile_tokens_dim right before kernel launching, so that the value of tile_tokens_dim is always up to date with the num_tokens of the current input tensor used for the kernel runner.
Also, the tile_tokens_dim is calculated based on the number of tokens of a tuned bucket, instead of the original token number. Because we only tune the value for the buckets, not for the raw input token number, to avoid unexpected misalignment between tile_tokens_dim and the token number.
This PR also removes the warmup requests with the extra input shapes, which are triggered in the CUDA graph warmup phase.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-18 10:58:52 +08:00
Yan Chunwei
327e5e5eed
[None][ci] restore unwaive list ( #7802 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-18 10:50:34 +08:00
Lucas Liebenwein
39eb120b96
[ #7308 ] [feat] AutoDeploy: graph-less transformers mode for HF ( #7635 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-09-18 10:44:24 +08:00
Netanel Haber
a5cfc8368f
[ https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async ( #7041 ) ( #7796 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-09-17 21:27:01 -04:00
yunruis
7c03eb9ea2
[ https://nvbugs/5516661 ][fix] Drop waive case 5516661 ( #7791 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-09-18 08:55:32 +08:00
Emma Qiao
c4abca323e
[None][infra] Waive failed tests on main ( #7812 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-17 23:44:36 +08:00
William Zhang
2614d71994
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 ( #7628 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-17 08:11:16 -07:00
xinhe-nv
f918302b3a
[TRTLLM-7250][fix] waive block tests ( #7782 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:31:03 +08:00
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test ( #7328 )
...
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
xinhe-nv
7801d0992b
[None][chore] Remove closed bugs ( #7697 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:14:09 +08:00
QI JUN
d3e680b3c3
[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] ( #7788 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 15:12:55 +08:00
Fanrong Li
523a17d990
[ https://nvbugs/5485325 ][fix] Cherry-pick #7373 : fix the CUDA graph warmup issue when using speculative decoding ( #7734 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-17 13:57:39 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa ( #7781 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
Lucas Liebenwein
4c3dc89f84
[None][chore] AutoDeploy: clean up of model unit test configuration ( #7742 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-17 10:42:01 +08:00