Commit Graph

922 Commits

Author SHA1 Message Date
Balaram Buddharaju
c7a86f89de
[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-29 02:57:13 -05:00
Yi Sun
f6dab8388d
[https://nvbugs/5813452][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922)
Signed-off-by: Yi Sun <yisun0618@gmail.com>
2026-01-29 14:38:11 +08:00
Ludwig Schneider
4e10bf8950
[None][fix] nccl symmetric with graceful fallbacks (#11042)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-28 15:43:24 -08:00
Linda
29647d9446
[None][chore] Removing cpp/tensorrt_llm/pybind (#11026)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-28 11:25:11 +01:00
Yuan Tong
30348b2753
[None][fix] Proper conditional compilation of sm10x cubins (#10839)
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-01-28 10:17:51 +08:00
NVShreyas
6c1862fb33
[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver (#10957)
Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>
2026-01-27 12:23:02 -08:00
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references (#10550)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler (#10941)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
yuanjingx87
f4b52d3b78
[None][infra] Regenerate out dated lock file (#10940)
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-01-23 09:21:03 -08:00
Leslie Fang
31d04dfa12
[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-23 10:16:58 +08:00
Yi Zhang
d43be7b65e
[None][fix] Avoid Double update for previous batch (#9888)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-01-22 13:15:06 -05:00
Shi Xiaowei
944c304bbb
[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-01-22 10:14:50 -08:00
Jiayu Chang
1dc49b266e
[https://nvbugs/5322131][feat] Multi-LoRA serving with CUDA Graph (#8279)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-01-22 14:01:18 +01:00
Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yukun He
bf7303c7f1
[https://nvbugs/5636916][fix] Cherry-pick #10654: Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-21 17:25:40 +08:00
HuiGao-NV
1592dfab6d
[https://nvbugs/5740377][fix] Lock resource to fix potential access to released data (#10827)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-21 14:17:29 +08:00
Daniel Stokes
2f3b2a3172
[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes (#9532)
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-21 10:14:39 +08:00
Zheng Duan
26c23cf99f
[https://nvbugs/5760737][test] only skip mooncake+indexerkcache test (#10266)
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2026-01-21 09:48:39 +08:00
jthomson04
2db3d7eeba
[None][chore] Async Transfer Manager (#9891)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-01-20 12:12:47 -05:00
Yi Zhang
58311b2345
[None][fix] Remove unused params in attn (#10652)
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-01-20 03:08:59 -05:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion (#10656) 2026-01-20 12:31:17 +08:00
Bo Li
f3a985ce27
[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. (#10539)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-20 11:08:04 +08:00
Liao Lanyu
dbb858ae0c
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-01-20 10:31:13 +08:00
Tian Zheng
cfebfbb505
[https://nvbugs/5783509][fix] Fix a hang issue when enabling skip softmax on Blackwell (#10490)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
Yukun He
f001c4946d
[https://nvbugs/5782112][fix] Fix hanging issue for MNNVL Allreduce under PP (#10633)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-16 13:03:36 +08:00
Chuang Zhu
8257b67ea5
[https://nvbugs/5791936][fix] Add warning for gen-only paused (#10664)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-16 11:18:24 +08:00
Enwei Zhu
7b8b9ccbaf
[https://nvbugs/5669671][fix] Support GuidedDecoder with sharded logits (#10698)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-16 11:04:26 +08:00
Thor Johnsen
0998a7bf20
[https://nvbugs/5721661][fix] Prevent out-of-bounds read (#9879)
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2026-01-15 10:51:40 -06:00
Lizhi Zhou
93db0d5e18
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 19:18:21 +08:00
Pengbo Wang
683515b1bd
[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window (#10335)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-15 15:47:00 +08:00
Perkz Zheng
71ccc07d2b
[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-15 02:24:25 -05:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905)
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support (#10532)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
dongfengy
6ee8dbfe0b
[https://nvbugs/5772396][fix] WAR: Disable TinyGEMM PDL due to accuracy issues (#10619)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Void
7d16f3a28b
[https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Iman Tabrizian
48b09e5a25
[https://nvbugs/5689235][fix] Fix cancellation+chunked prefill+disagg (#10111)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention (#10264)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Min Yu
9cae7277ea
[https://nvbugs/5726962][feat] Apply fusion for W4AFP8_AWQ MoE (#9838)
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) (#10113)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00