dongfengy
2ae08bd1b8
[ https://nvbugs/5519530 ][fix] Fix gptoss 2-gpu test ( #7819 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2025-09-18 16:01:53 +08:00
xinhe-nv
236f71ea05
[None][chore] Add failed cases into waives.txt ( #7801 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-18 14:48:16 +08:00
Leslie Fang
870cfcf9a0
[None][chore] Remove executor config in create_py_executor ( #7599 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2025-09-18 14:24:58 +08:00
yuanjingx87
b6e916b762
[None][infra] update ci allow list 2025/09/17 ( #7816 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-17 23:21:40 -07:00
mpikulski
1c7f601265
[ https://nvbugs/5508890 ][fix] gen. result cleanup when using PostprocWorker ( #7771 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2025-09-18 14:01:18 +08:00
Li Min
14e455da3e
[None][fix] Fix CI issue for dsl pkg install ( #7784 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-09-18 13:58:20 +08:00
Barry Kang
4f0e6b5f96
[None][feat] Cherry-pick DeepGEMM related commits from release/1.1.0rc2 ( #7716 )
...
Signed-off-by: Barry Kang <43644113+Barry-Delaney@users.noreply.github.com>
2025-09-18 13:51:48 +08:00
Ziyi Xiong
28469dbf27
[ https://nvbugs/5523080 ][fix] Correct the batch index in device tensors ( #7803 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-18 13:45:37 +08:00
Ivy Zhang
26d50eb539
[TRTLLM-8070][test] add generation logits case for llama3 ( #7759 )
...
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
2025-09-18 13:33:16 +08:00
Guoming Zhang
e0423bfaab
[ https://nvbugs/5519544 ][fix] fix invalid expression for disabling pa… ( #7806 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-09-18 12:54:52 +08:00
Yanchao Lu
f8e811d134
[None][chore] Version bump for 1.1.0rc6 ( #7824 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2025-09-18 11:13:56 +08:00
Yukun He
cd80e0a7f1
[None][fix] Make tile_tokens_dim calculation just in time before kernel launching. ( #7529 )
...
tile_tokens_dim directly depends on the num_token, which is a dynamic shape during tuning and inference. When AutoTuner prepares dummy tensors with different num_tokens, it does not update the value of tile_tokens_dim automatically. Therefore, the value stored in the AutoTuner cache is misaligned, which will introduce a lot of cache misses during inference, which hurts perf a lot.
To avoid this issue, we move the calculation of tile_tokens_dim right before kernel launching, so that the value of tile_tokens_dim is always up to date with the num_tokens of the current input tensor used for the kernel runner.
Also, the tile_tokens_dim is calculated based on the number of tokens of a tuned bucket, instead of the original token number. Because we only tune the value for the buckets, not for the raw input token number, to avoid unexpected misalignment between tile_tokens_dim and the token number.
This PR also removes the warmup requests with the extra input shapes, which are triggered in the CUDA graph warmup phase.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-18 10:58:52 +08:00
Yan Chunwei
327e5e5eed
[None][ci] restore unwaive list ( #7802 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2025-09-18 10:50:34 +08:00
Lucas Liebenwein
39eb120b96
[ #7308 ] [feat] AutoDeploy: graph-less transformers mode for HF ( #7635 )
...
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
Co-authored-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
2025-09-18 10:44:24 +08:00
Netanel Haber
a5cfc8368f
[ https://nvbugs/5508536 ][fix] Revert #7041 : Move stop_criteria to sample_async ( #7041 ) ( #7796 )
...
Signed-off-by: Netanel Haber <nhaber@nvidia.com>
Signed-off-by: Netanel Haber <58652339+netanel-haber@users.noreply.github.com>
Co-authored-by: Mike Iovine <miovine@nvidia.com>
2025-09-17 21:27:01 -04:00
yunruis
7c03eb9ea2
[ https://nvbugs/5516661 ][fix] Drop waive case 5516661 ( #7791 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-09-18 08:55:32 +08:00
Matthias Jouanneaux
022d77807d
[TRTLLM-5966][feat] Helix: make softmax stats pointer available to attention gen ( #6865 )
...
Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
2025-09-18 05:01:24 +08:00
Anu
2b1472fb0a
[None][doc] Update Documentation link to point to docs instead of docs source code ( #6495 )
...
Signed-off-by: Anu <asrivastava274@gmail.com>
2025-09-18 04:39:18 +08:00
Emma Qiao
c4abca323e
[None][infra] Waive failed tests on main ( #7812 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2025-09-17 23:44:36 +08:00
William Zhang
2614d71994
[TRTLLM-7410][feat] Enable KV cache reuse and chunked prefill for mistral3.1 ( #7628 )
...
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2025-09-17 08:11:16 -07:00
QI JUN
d3467f9f12
[None][doc] fix section header of llm_kv_cache_offloading example ( #7795 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 17:26:11 +08:00
xinhe-nv
f918302b3a
[TRTLLM-7250][fix] waive block tests ( #7782 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:31:03 +08:00
ruodil
e6073b3911
[None][test] add gpt oss model for trtllm perf test ( #7328 )
...
Signed-off-by: Ruodi Lu <ruodil@nvidia.com>
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Larry <197874197+LarryXFly@users.noreply.github.com>
2025-09-17 15:23:21 +08:00
xinhe-nv
7801d0992b
[None][chore] Remove closed bugs ( #7697 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2025-09-17 15:14:09 +08:00
QI JUN
d3e680b3c3
[None][ci] waive test_llama_eagle3[True-FLASHINFER-False-False-False-False-True] ( #7788 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 15:12:55 +08:00
Fanrong Li
523a17d990
[ https://nvbugs/5485325 ][fix] Cherry-pick #7373 : fix the CUDA graph warmup issue when using speculative decoding ( #7734 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
2025-09-17 13:57:39 +08:00
QI JUN
39248320d4
[None][feat] add an example of KV cache host offloading ( #7767 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 13:51:15 +08:00
Zhenhuan Chen
6983e8a00d
[ https://nvbugs/5517260 ][fix] move scaffolding contrib module's import to subdirectory ( #7758 )
...
Signed-off-by: Zhenhuan Chen <chenzhh3671@gmail.com>
2025-09-17 11:36:33 +08:00
QI JUN
bd7aad4988
[None][ci] waive test_llm_gemma_1gpu_summary_vswa ( #7781 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
2025-09-17 10:48:31 +08:00
Lucas Liebenwein
4c3dc89f84
[None][chore] AutoDeploy: clean up of model unit test configuration ( #7742 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-17 10:42:01 +08:00
Kaiyu Xie
62042a9733
[TRTLLM-6741] [feat] enable LM tp for MTP, under attention dp case (cherry-pick #7128 ) ( #7571 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
Co-authored-by: Cheng Hang <chang@nvidia.com>
2025-09-17 09:41:32 +08:00
Yukun He
6313c9799c
[ https://nvbugs/5488582 ][fix] Cherry-pick 7495: Avoid unexpected Triton recompilation in DG fused_moe ( #7708 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-09-17 09:00:28 +08:00
Shiyu Li
8bdbb48264
[ https://nvbugs/5489015 ][fix] Support communicator split in MNNVL allreduce and fix the binding issues. ( #7387 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-09-17 07:43:20 +08:00
Iman Tabrizian
a91453de34
[None][waive] Waive tests ( #7775 )
...
Signed-off-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2025-09-16 19:42:32 -04:00
HuiGao-NV
a49cfb3e68
[ https://nvbugs/5516666 ][fix] cherrypick fix to the CUDA graph warmup issue when using speculative decoding ( #7737 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Tao Li @ NVIDIA <tali@nvidia.com>
Co-authored-by: Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-17 06:24:20 +08:00
yuanjingx87
eeb89a167c
[None][infra] Add nightly pipeline to generate lock files ( #5798 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 15:00:03 -07:00
yuanjingx87
88d9d77912
[None][infra] Update CI allowlist 2025-09-16 ( #7773 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 14:55:30 -07:00
Chang Liu
98f533453a
[TRTLLM-7398][doc] Add doc for KV cache salting support ( #7772 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-09-16 14:49:14 -07:00
yuanjingx87
0f30d7dd6f
[None][infra] add nspect allow list for false positive secrets ( #5797 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2025-09-16 13:23:01 -07:00
Aurelien Chartier
471723bce1
[None][chore] Remove unused get_quant_scales methods ( #7687 )
...
Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>
2025-09-16 12:56:11 -07:00
Lucas Liebenwein
9befd1a72f
[None][chore] AutoDeploy: neat disablement of transforms in pipeline ( #7736 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2025-09-16 23:31:48 +08:00
Iman Tabrizian
6ce0624208
[TRTLLM-8044][refactor] Rename data -> cache for cacheTransceiver ( #7659 )
2025-09-16 08:43:56 -04:00
bhsueh_NV
8226ef23dc
Revert "[None][feat] support attention dp for qwen3 dense model" ( #7765 )
2025-09-16 19:09:04 +08:00
xinhe-nv
e7c1569456
[None][chore] Add failed cases into waives.txt ( #7746 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 18:43:40 +08:00
Ziyi Xiong
905bb26bbd
[ https://nvbugs/5471106 ][fix] Remove the waivers ( #7711 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-09-16 17:43:39 +08:00
xinhe-nv
c6ab2072b5
[None][fix] waive hang tests on main ( #7720 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 17:05:15 +08:00
xinhe-nv
1fbea497ff
[TRTLLM-7070][feat] add gpt-oss serve benchmark tests ( #7638 )
...
Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>
2025-09-16 16:39:31 +08:00
amitz-nv
750d15bfaa
[ https://nvbugs/5503529 ][fix] Change test_llmapi_example_multilora to get adapters path from cmd line to avoid downloading from HF ( #7740 )
...
Signed-off-by: Amit Zuker <203509407+amitz-nv@users.noreply.github.com>
2025-09-16 16:35:13 +08:00
Kaiyu Xie
6eef19297f
[None] [chore] cherry pick changes on slurm scripts from release/1.1.0rc2 ( #7750 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2025-09-16 16:07:13 +08:00
Li Min
b278d06481
[TRTLLM-6898][feat] Add Cute DSL nvfp4 linear op ( #7632 )
...
Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>
2025-09-16 14:25:26 +08:00