Guoming Zhang
6bace84167
[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super ( #10791 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-31 13:48:25 +08:00
Yibin Li
322471cdd7
[ https://nvbugs/5825514 ][fix] Add null pointer check to parseNpyHeader ( #10944 )
...
Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>
This PR addresses known security issues. For the latest NVIDIA Vulnerability Disclosure Information visit https://www.nvidia.com/en-us/security/ , for acknowledgement please reach out to the NVIDIA PSIRT team at PSIRT@nvidia.com
2026-01-30 03:01:33 -05:00
Jin Li
ef268e2062
[TRTLLM-9904][feat] Changes for future KVCacheV2 MTP support ( #11029 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-30 01:49:17 -05:00
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP ( #10477 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Yi Sun
f6dab8388d
[ https://nvbugs/5813452 ][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" ( #10922 )
...
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2026-01-29 14:38:11 +08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks ( #11042 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Linda
29647d9446
[None][chore] Removing cpp/tensorrt_llm/pybind ( #11026 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-28 11:25:11 +01:00
Yuan Tong
30348b2753
[None][fix] Proper conditional compilation of sm10x cubins ( #10839 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-01-28 10:17:51 +08:00
NVShreyas
6c1862fb33
[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver ( #10957 )
...
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2026-01-27 12:23:02 -08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) ( #10495 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super ( #10754 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references ( #10550 )
...
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. ( #10885 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV ( #10813 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler ( #10941 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 ( #10736 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
yuanjingx87
f4b52d3b78
[None][infra] Regenerate out dated lock file ( #10940 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-23 09:21:03 -08:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu ( #10699 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
Yi Zhang
d43be7b65e
[None][fix] Avoid Double update for previous batch ( #9888 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-01-22 13:15:06 -05:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) ( #10494 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Jiayu Chang
1dc49b266e
[ https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph ( #8279 )
...
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id ( #10187 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yukun He
bf7303c7f1
[ https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel ( #10841 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-21 17:25:40 +08:00
HuiGao-NV
1592dfab6d
[ https://nvbugs/5740377 ][fix] Lock resource to fix potential access to released data ( #10827 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-21 14:17:29 +08:00
Daniel Stokes
2f3b2a3172
[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes ( #9532 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-21 10:14:39 +08:00
Zheng Duan
26c23cf99f
[ https://nvbugs/5760737 ][test] only skip mooncake+indexerkcache test ( #10266 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2026-01-21 09:48:39 +08:00
jthomson04
2db3d7eeba
[None][chore] Async Transfer Manager ( #9891 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-01-20 12:12:47 -05:00
Yi Zhang
58311b2345
[None][fix] Remove unused params in attn ( #10652 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-01-20 03:08:59 -05:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion ( #10656 )
2026-01-20 12:31:17 +08:00
Bo Li
f3a985ce27
[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. ( #10539 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-20 11:08:04 +08:00
Liao Lanyu
dbb858ae0c
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python ( #10273 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-01-20 10:31:13 +08:00
Tian Zheng
cfebfbb505
[ https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell ( #10490 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
Yukun He
f001c4946d
[ https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP ( #10633 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-16 13:03:36 +08:00
Chuang Zhu
8257b67ea5
[ https://nvbugs/5791936 ][fix] Add warning for gen-only paused ( #10664 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-16 11:18:24 +08:00
Enwei Zhu
7b8b9ccbaf
[ https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits ( #10698 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-16 11:04:26 +08:00
Thor Johnsen
0998a7bf20
[ https://nvbugs/5721661 ][fix] Prevent out-of-bounds read ( #9879 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2026-01-15 10:51:40 -06:00
Lizhi Zhou
93db0d5e18
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg ( #10406 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 19:18:21 +08:00
Pengbo Wang
683515b1bd
[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window ( #10335 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-15 15:47:00 +08:00
Perkz Zheng
71ccc07d2b
[None][feat] update trtllm-gen to support groupsTokensHeadsQ ( #10261 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-15 02:24:25 -05:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel ( #9905 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 ( #9818 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support ( #10532 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API ( #9715 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
dongfengy
6ee8dbfe0b
[ https://nvbugs/5772396 ][fix] WAR: Disable TinyGEMM PDL due to accuracy issues ( #10619 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Min Yu
9cae7277ea
[ https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE ( #9838 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) ( #10113 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00