benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Min Yu
9cae7277ea
[ https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE ( #9838 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
Chuang Zhu
536a8f6a9c
[TRTLLM-9527][feat] Add transferAgent binding (step 1) ( #10113 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-06 08:40:38 +08:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order ( #10350 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. ( #8531 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
Cheng Hang
656c705ff1
[None][feat] sm100 weight-only kernel ( #10190 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2026-01-05 09:44:36 +08:00
dongfengy
afc533193d
[None][feat] Support nvfp4 for gptoss ( #8956 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-04 08:57:44 -05:00
Ludwig Schneider
59045a0e41
[None][fix] [fix] Make NCCL resource manager destructor exception-safe ( #10166 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-01-03 10:25:05 -05:00
Bo Deng
9e7b50aefb
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues ( #10285 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-03 14:34:55 +08:00
Yueh-Ting (eop) Chen
9cee32ab39
[ https://nvbugs/5625990 ][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager ( #10183 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2025-12-29 14:29:14 +08:00
Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
Jin Li
7e4cef9def
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 ( #9446 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-25 10:23:04 -05:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default ( #9695 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name ( #10248 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Roey Azran
8408c40d8b
[ https://nvbugs/5702786 ][fix] Fix race conditions in KV cache communication during unexpected termination ( #10076 )
...
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-12-23 14:09:51 +02:00
Shiyu Li
3ddc9d2b48
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. ( #10197 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Enwei Zhu
2ce785f39a
[ https://nvbugs/5643631 ][fix] Fix hostfunc seg fault ( #10028 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 07:58:43 -05:00
longcheng-nv
b882393d69
[ https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case ( #10027 )
...
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 14:58:01 -05:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend ( #8309 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
Chuang Zhu
e0b2a94309
[None][fix] Fix ready signal in NIXL backend ( #10000 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-19 09:43:40 +08:00
Enwei Zhu
6fe89ea00f
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output ( #9840 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-18 10:36:38 -08:00
CarstyYou
0b279f4ad4
[ https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 ( #9687 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-12-18 22:57:20 +08:00
Nikita Korobov
3b4f26e4d1
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 ( #9734 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-18 11:58:23 +01:00
Perkz Zheng
064b67e40c
[ https://nvbugs/5727952 ][fix] a pdl bug in trtllm-gen fmha kernels ( #9913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts ( #9992 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend ( #9792 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
Anthony Chang
ad12b795c9
[ https://nvbugs/5661741 ][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 ( #9999 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 03:31:56 -08:00
Void
dda7658306
[ https://nvbugs/5655885 ][fix] fix invalid instruction error in 2shot ar kernel on Ampere ( #9394 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-15 14:22:56 +08:00
Yuxian Qiu
7588029763
[None][feat] Async pp send for PPCommTorch. ( #9976 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-15 14:03:46 +08:00
Anthony Chang
3be5f3abcf
[None][fix] Fix regex pattern for cubin filtering ( #9914 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 10:02:48 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. ( #9604 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Balaram Buddharaju
9a1750c8f9
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing ( #9922 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 11:29:30 -08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Faraz
64d7796234
[None][chore] Add namespace to header to fix tot failure ( #9973 )
2025-12-13 12:18:10 -05:00
Faraz
98d72c7648
[None][feat] spark cublas LUT table for llama-8b-bf16 perf ( #9811 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-12-12 22:37:56 -05:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 ( #9924 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[ https://nvbugs/4141427 ][chore] Add more details to LICENSE file ( #9881 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
Chuang Zhu
4cc4cbe926
[ https://nvbugs/5716787 ][fix] terminate nixl running when exiting ( #9785 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[ https://nvbugs/5643787 ][fix] remove the war path for notify to itself ( #9834 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00