Lizhi Zhou
f3a41c8d94
[TRTLLM-10059][feat] Use global unique id as disagg request id ( #10187 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-21 22:52:34 -05:00
Yukun He
bf7303c7f1
[ https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel ( #10841 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-21 17:25:40 +08:00
HuiGao-NV
1592dfab6d
[ https://nvbugs/5740377 ][fix] Lock resource to fix potential access to released data ( #10827 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-21 14:17:29 +08:00
Daniel Stokes
2f3b2a3172
[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes ( #9532 )
...
Signed-off-by: djns99 <40156487+djns99@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-21 10:14:39 +08:00
Zheng Duan
26c23cf99f
[ https://nvbugs/5760737 ][test] only skip mooncake+indexerkcache test ( #10266 )
...
Signed-off-by: zhengd-nv <200704041+zhengd-nv@users.noreply.github.com>
2026-01-21 09:48:39 +08:00
jthomson04
2db3d7eeba
[None][chore] Async Transfer Manager ( #9891 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-01-20 12:12:47 -05:00
Yi Zhang
58311b2345
[None][fix] Remove unused params in attn ( #10652 )
...
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-01-20 03:08:59 -05:00
benzh-2025
4c8468c5d3
[None][fix] default disable gemm+allreduce fusion ( #10656 )
2026-01-20 12:31:17 +08:00
Bo Li
f3a985ce27
[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. ( #10539 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-20 11:08:04 +08:00
Liao Lanyu
dbb858ae0c
[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python ( #10273 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-01-20 10:31:13 +08:00
Tian Zheng
cfebfbb505
[ https://nvbugs/5783509 ][fix] Fix a hang issue when enabling skip softmax on Blackwell ( #10490 )
...
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-16 18:59:54 +08:00
Yukun He
f001c4946d
[ https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP ( #10633 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-16 13:03:36 +08:00
Chuang Zhu
8257b67ea5
[ https://nvbugs/5791936 ][fix] Add warning for gen-only paused ( #10664 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-16 11:18:24 +08:00
Enwei Zhu
7b8b9ccbaf
[ https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits ( #10698 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-16 11:04:26 +08:00
Thor Johnsen
0998a7bf20
[ https://nvbugs/5721661 ][fix] Prevent out-of-bounds read ( #9879 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2026-01-15 10:51:40 -06:00
Lizhi Zhou
93db0d5e18
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg ( #10406 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 19:18:21 +08:00
Pengbo Wang
683515b1bd
[None][feat] Use XQA JIT impl by default and mitigate perf loss with sliding window ( #10335 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-15 15:47:00 +08:00
Perkz Zheng
71ccc07d2b
[None][feat] update trtllm-gen to support groupsTokensHeadsQ ( #10261 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-15 02:24:25 -05:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel ( #9905 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 ( #9818 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support ( #10532 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
mpikulski
052c36ddd2
[TRTLLM-9522][feat] support image_embeds in OpenAI API ( #9715 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-14 10:31:03 +01:00
dongfengy
6ee8dbfe0b
[ https://nvbugs/5772396 ][fix] WAR: Disable TinyGEMM PDL due to accuracy issues ( #10619 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Min Yu
9cae7277ea
[ https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE ( #9838 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) ( #10113 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order ( #10350 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. ( #8531 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
Cheng Hang
656c705ff1
[None][feat] sm100 weight-only kernel ( #10190 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2026-01-05 09:44:36 +08:00
dongfengy
afc533193d
[None][feat] Support nvfp4 for gptoss ( #8956 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-04 08:57:44 -05:00
Ludwig Schneider
59045a0e41
[None][fix] [fix] Make NCCL resource manager destructor exception-safe ( #10166 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-03 10:25:05 -05:00
Bo Deng
9e7b50aefb
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues ( #10285 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-03 14:34:55 +08:00
Yueh-Ting (eop) Chen
9cee32ab39
[ https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager ( #10183 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-12-29 14:29:14 +08:00
Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
Jin Li
7e4cef9def
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 ( #9446 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-25 10:23:04 -05:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default ( #9695 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name ( #10248 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Roey Azran
8408c40d8b
[ https://nvbugs/5702786 ][fix] Fix race conditions in KV cache communication during unexpected termination ( #10076 )
...
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-12-23 14:09:51 +02:00
Shiyu Li
3ddc9d2b48
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. ( #10197 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00