dongfengy
6ee8dbfe0b
[ https://nvbugs/5772396 ][fix] WAR: Disable TinyGEMM PDL due to accuracy issues ( #10619 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-13 12:40:11 -05:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Pengbo Wang
c0e25e5418
[TRTLLM-10022][feat] Add hopper xqa decode support for skip softmax attention ( #10264 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-11 19:26:10 -05:00
Min Yu
9cae7277ea
[ https://nvbugs/5726962 ][feat] Apply fusion for W4AFP8_AWQ MoE ( #9838 )
...
Signed-off-by: Min Yu <171526537+yumin066@users.noreply.github.com>
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
Co-authored-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-06 10:16:41 +08:00
Cheng Hang
656c705ff1
[None][feat] sm100 weight-only kernel ( #10190 )
...
Signed-off-by: Cheng Hang <chang@nvidia.com>
2026-01-05 09:44:36 +08:00
Bo Deng
9e7b50aefb
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues ( #10285 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-01-03 14:34:55 +08:00
Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default ( #9695 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name ( #10248 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Shiyu Li
3ddc9d2b48
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. ( #10197 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
longcheng-nv
b882393d69
[ https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case ( #10027 )
...
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 14:58:01 -05:00
CarstyYou
0b279f4ad4
[ https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 ( #9687 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-12-18 22:57:20 +08:00
Nikita Korobov
3b4f26e4d1
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 ( #9734 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-18 11:58:23 +01:00
Perkz Zheng
064b67e40c
[ https://nvbugs/5727952 ][fix] a pdl bug in trtllm-gen fmha kernels ( #9913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts ( #9992 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend ( #9792 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
Anthony Chang
ad12b795c9
[ https://nvbugs/5661741 ][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 ( #9999 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 03:31:56 -08:00
Void
dda7658306
[ https://nvbugs/5655885 ][fix] fix invalid instruction error in 2shot ar kernel on Ampere ( #9394 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-15 14:22:56 +08:00
Anthony Chang
3be5f3abcf
[None][fix] Fix regex pattern for cubin filtering ( #9914 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 10:02:48 +08:00
Balaram Buddharaju
9a1750c8f9
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing ( #9922 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 11:29:30 -08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 ( #9924 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[ https://nvbugs/4141427 ][chore] Add more details to LICENSE file ( #9881 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
ChristinaZ
b8a5159fad
[None][feat] Enable PDL for indexer topK ( #9843 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-11 14:31:23 +08:00
Brian K. Ryu
8cec2da375
[None][feat] Port fp4 quantization kernel optimization from FlashInfer ( #9854 )
...
Signed-off-by: Brian Ryu <bryu@nvidia.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-10 13:13:48 +01:00
Perkz Zheng
e34302986d
[ https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels ( #9863 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-10 01:47:03 -08:00
Bo Li
9d3c675a0b
[None][chore] Support larger topK for NVLinkOneSided AlltoAll. ( #9816 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-10 11:10:55 +08:00
Tri Dao
1c4dacb19a
[None][fix] Fix PDL in TRTLLM MOE for dsv3 ( #9799 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
2025-12-09 10:16:29 +08:00
Ludwig Schneider
41ce14ab04
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce ( #9314 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2025-12-07 09:43:26 -08:00
Enwei Zhu
7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP ( #9592 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
Perkz Zheng
992781dc7b
[None][feat] update trtllm-gen nvfp4 kernels with better performance ( #9510 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-03 21:35:49 +08:00
brb-nv
43f6ad7813
[ https://nvbugs/5708475 ][fix] Fix e2e eval accuracy for helix parallelism ( #9647 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-03 15:13:59 +08:00
Enwei Zhu
34e2fa5c96
[ https://nvbugs/5690172 ][fix] Fix Qwen3-235B ATP accuracy issue with PDL ( #9530 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-01 09:10:21 +08:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
dominicshanshan
6345074686
[None][chore] Weekly mass integration of release/1.1 -- rebase ( #9522 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
Signed-off-by: qgai <qgai@nvidia.com>
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Simeng Liu <simengl@nvidia.com>
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Vincent Zhang <vinczhang@nvidia.com>
Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com>
Signed-off-by: Michal Guzek <mguzek@nvidia.com>
Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com>
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com>
Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com>
Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com>
Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com>
Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com>
Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com>
Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Co-authored-by: Vincent Zhang <vcheungyi@163.com>
Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com>
Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com>
Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>
2025-11-29 21:48:48 +08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model ( #9376 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Anthony Chang
4742c130db
[None][feat] Improve TRTLLM MoE in small hidden size throughput cases ( #9377 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-11-25 09:09:27 +01:00
bhsueh_NV
1a93583438
[None][feat] Support Yarn on QwQ-32B model ( #9059 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com>
Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>
2025-11-25 07:27:28 +08:00
YueWeng
336593cac5
[None][fix] Fix topk outIndices when using vectorized_process ( #9404 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-24 09:08:00 -08:00
Bo Li
fcfec93cad
[TRTLLM-9389][chore] Rename AlltoAll backend names ( #9329 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-11-23 13:52:57 -08:00
Chenghao Zhang
564989865c
[TRTLLM-9082][feat] AutoDeploy: Move the moe Align kernel to AOT ( #9106 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2025-11-21 16:05:48 -08:00