Commit Graph

867 Commits

Author SHA1 Message Date
Ludwig Schneider
5130cbd73e
[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC (#11326)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-02-12 14:31:51 -08:00
Wanli Jiang
421eb9e39c
[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion (#11273)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-02-12 09:25:31 -05:00
Simeng Liu
12085536df
[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. (#11075)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-12 00:48:47 -05:00
Yukun He
632c039aea
[TRTLLM-10793][feat] Add BOLT compatible build flags for further experimental usage. (#11297)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 09:54:58 +08:00
Harris Nover
2c4a4c7b94
[None][fix] Fix out-of-bounds array access in kernel factory Get() methods (#11373)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 19:21:01 -05:00
Iman Tabrizian
7d992972b2
[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ (#10540)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-10 07:20:56 -08:00
Bo Deng
be88fe33be
[None][fix] fix tinygemm accuracy (#11411)
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-02-10 05:09:30 -05:00
Jonas Li
8b2dc57823
[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch (#11384)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2026-02-10 14:00:42 +08:00
Yuxian Qiu
af68c29d3d
[None][chore] Reduce attention module repeated warnings. (#11335)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 08:58:21 +08:00
Iman Tabrizian
18e611da77
[https://nvbugs/5863392][fix] fix partial reuse disabled for disagg (#11247)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130)
Added FP8 cute dsl gemm and batch gemm.

Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00
Yuxian Qiu
d3d951d837
[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. (#11256)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-06 00:28:29 +08:00
Yuewei Na
0d18b2d7a4
[None][feat] Add priority-based KV cache offload filtering support (#10751)
Signed-off-by: Yuewei Na <yna@nvidia.com>
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
2026-02-05 05:22:56 -05:00
Simeng Liu
d9fd8cc951
[https://nvbugs/5674665][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-04 12:46:31 -05:00
Yukun He
de465efc5f [https://nvbugs/5814309][fix] Use NCCL as fallback to avoid crash due to insufficient memory (#10928)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Zhenhuan Chen
6c2ecad2fe [https://nvbugs/5769425][fix] add syncthreads for tinygemm to resolve intermittent accuracy problem (#10873)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
HuiGao-NV
8fd22ac72d [https://nvbugs/5740377][fix] Prevent out-of-bounds read (#10868)
Signed-off-by: Hui Gao <huig@nvidia.com>
Co-authored-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Yi Zhang
0306c0f12c
[TRTLLM-9766][feat] Integration of the KVCacheManager V2 to TRTLLM Runtime (#10659)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-02 14:29:02 +08:00
Kaiyu Xie
9909dca6fa
[None] [feat] Add PDL support for moeAlltoAllKernels (#10591)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
Co-authored-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-02-02 13:23:37 +08:00
Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Yibin Li
322471cdd7
[https://nvbugs/5825514][fix] Add null pointer check to parseNpyHeader (#10944)
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
This PR addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/, for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
2026-01-30 03:01:33 -05:00
Jin Li
ef268e2062
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support (#11029)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-30 01:49:17 -05:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Yi Sun
f6dab8388d
[https://nvbugs/5813452][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2026-01-29 14:38:11 +08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks (#11042)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Linda
29647d9446
[None][chore] Removing cpp/tensorrt_llm/pybind (#11026)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-28 11:25:11 +01:00
Yuan Tong
30348b2753
[None][fix] Proper conditional compilation of sm10x cubins (#10839)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-01-28 10:17:51 +08:00
NVShreyas
6c1862fb33
[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver (#10957)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2026-01-27 12:23:02 -08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references (#10550)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler (#10941)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
Yi Zhang
d43be7b65e
[None][fix] Avoid Double update for previous batch (#9888)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-01-22 13:15:06 -05:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Jiayu Chang
1dc49b266e
[https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yukun He
bf7303c7f1
[https://nvbugs/5636916][fix] Cherry-pick #10654: Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-21 17:25:40 +08:00
HuiGao-NV
1592dfab6d
[https://nvbugs/5740377][fix] Lock resource to fix potential access to released data (#10827)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-21 14:17:29 +08:00
Daniel Stokes
2f3b2a3172
[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes (#9532)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-21 10:14:39 +08:00
Zheng Duan
26c23cf99f
[https://nvbugs/5760737][test] only skip mooncake+indexerkcache test (#10266)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2026-01-21 09:48:39 +08:00
jthomson04
2db3d7eeba
[None][chore] Async Transfer Manager (#9891)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-01-20 12:12:47 -05:00
Yi Zhang
58311b2345
[None][fix] Remove unused params in attn (#10652)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-01-20 03:08:59 -05:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion (#10656) 2026-01-20 12:31:17 +08:00
Bo Li
f3a985ce27
[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. (#10539)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-20 11:08:04 +08:00
Liao Lanyu
dbb858ae0c
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-01-20 10:31:13 +08:00
Tian Zheng
cfebfbb505
[https://nvbugs/5783509][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00