Guoming Zhang
93ac0bc1dc
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2025-12-27 22:48:10 +08:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
Jin Li
7e4cef9def
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 ( #9446 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-25 10:23:04 -05:00
Zhenhuan Chen
8462cf6c96
[TRTLLM-9578][feat] make PDL enabled by default ( #9695 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2025-12-25 07:15:24 -05:00
Gabriel Wu
1d01214ff0
[None][feat] Drop non-deepgemm fp8 block scale gemm ( #10256 )
...
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
2025-12-25 14:52:52 +08:00
Jonas Li
ecea71ca7a
[None][chore] Update tinygemm kernel name ( #10248 )
...
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
2025-12-24 02:33:25 -05:00
Balaram Buddharaju
8c1cfc872b
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism ( #9986 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-23 18:14:30 -08:00
Roey Azran
8408c40d8b
[ https://nvbugs/5702786 ][fix] Fix race conditions in KV cache communication during unexpected termination ( #10076 )
...
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
2025-12-23 14:09:51 +02:00
Shiyu Li
3ddc9d2b48
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
...
Signed-off-by: Shiyu Li <shili@nvidia.com>
2025-12-23 16:07:07 +08:00
Bo Li
cc1323be24
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. ( #10197 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-23 02:13:37 -05:00
Bo Li
472fe497dc
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. ( #9822 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-22 05:57:12 -05:00
Perkz Zheng
c87f1a6b39
[ https://nvbugs/5503479 ][fix] update trtllm-gen kernels to address few bugs ( #10089 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-22 04:45:33 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Enwei Zhu
2ce785f39a
[ https://nvbugs/5643631 ][fix] Fix hostfunc seg fault ( #10028 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-20 07:58:43 -05:00
longcheng-nv
b882393d69
[ https://nvbugs/5720357 ][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case ( #10027 )
...
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-12-19 14:58:01 -05:00
Wangjue Yao
9f283f330b
[None][feat] Support Mooncake transfer engine as a cache transceiver backend ( #8309 )
...
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
2025-12-19 10:09:51 +08:00
Chuang Zhu
e0b2a94309
[None][fix] Fix ready signal in NIXL backend ( #10000 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-19 09:43:40 +08:00
Enwei Zhu
6fe89ea00f
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output ( #9840 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-18 10:36:38 -08:00
CarstyYou
0b279f4ad4
[ https://nvbugs/5456493 ][feat] Add fp8 bmm on sm120 ( #9687 )
...
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
2025-12-18 22:57:20 +08:00
Nikita Korobov
3b4f26e4d1
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 ( #9734 )
...
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-18 11:58:23 +01:00
Perkz Zheng
064b67e40c
[ https://nvbugs/5727952 ][fix] a pdl bug in trtllm-gen fmha kernels ( #9913 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-16 00:34:37 -08:00
Yihan Wang
6b5ebaae3e
[None][chore] Update internal_cutlass_kernels artifacts ( #9992 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-15 21:15:25 -08:00
ChristinaZ
dff77efa2a
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend ( #9792 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-15 19:59:08 -08:00
Anthony Chang
ad12b795c9
[ https://nvbugs/5661741 ][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 ( #9999 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 03:31:56 -08:00
Void
dda7658306
[ https://nvbugs/5655885 ][fix] fix invalid instruction error in 2shot ar kernel on Ampere ( #9394 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2025-12-15 14:22:56 +08:00
Yuxian Qiu
7588029763
[None][feat] Async pp send for PPCommTorch. ( #9976 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-12-15 14:03:46 +08:00
Anthony Chang
3be5f3abcf
[None][fix] Fix regex pattern for cubin filtering ( #9914 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2025-12-15 10:02:48 +08:00
Simeng Liu
f21e2b3329
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. ( #9604 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2025-12-15 08:42:30 +08:00
Balaram Buddharaju
9a1750c8f9
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing ( #9922 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-14 11:29:30 -08:00
nvxuanyuc
a5a37227d6
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe ( #9852 )
...
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
2025-12-14 10:47:24 +08:00
Faraz
64d7796234
[None][chore] Add namespace to header to fix tot failure ( #9973 )
2025-12-13 12:18:10 -05:00
Faraz
98d72c7648
[None][feat] spark cublas LUT table for llama-8b-bf16 perf ( #9811 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2025-12-12 22:37:56 -05:00
Balaram Buddharaju
461446045e
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 ( #9924 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 16:49:25 -08:00
tburt-nv
6147452158
[ https://nvbugs/4141427 ][chore] Add more details to LICENSE file ( #9881 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2025-12-13 08:35:31 +08:00
Chuang Zhu
4cc4cbe926
[ https://nvbugs/5716787 ][fix] terminate nixl running when exiting ( #9785 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2025-12-12 11:15:02 -05:00
Chuang Zhu
9c59c9f920
[ https://nvbugs/5643787 ][fix] remove the war path for notify to itself ( #9834 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2025-12-12 11:10:05 -05:00
Yihan Wang
9df4dad3b6
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2025-12-12 23:32:15 +08:00
Yukun He
a6263a127f
[None][chore] Degrade log level in cublas fp4 runner when using default configs ( #9951 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-12-12 18:53:54 +08:00
ChristinaZ
b8a5159fad
[None][feat] Enable PDL for indexer topK ( #9843 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-12-11 14:31:23 +08:00
Brian K. Ryu
8cec2da375
[None][feat] Port fp4 quantization kernel optimization from FlashInfer ( #9854 )
...
Signed-off-by: Brian Ryu <bryu@nvidia.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
2025-12-10 13:13:48 +01:00
Perkz Zheng
e34302986d
[ https://nvbugs/5727952 ][fix] PDL bugs with trtllm-gen fmha kernels ( #9863 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2025-12-10 01:47:03 -08:00
Bo Li
9d3c675a0b
[None][chore] Support larger topK for NVLinkOneSided AlltoAll. ( #9816 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2025-12-10 11:10:55 +08:00
Jiagan Cheng
4a3a66b124
[ https://nvbugs/5677746 ][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang ( #9659 )
...
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
2025-12-08 18:43:52 -08:00
Tri Dao
1c4dacb19a
[None][fix] Fix PDL in TRTLLM MOE for dsv3 ( #9799 )
...
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
2025-12-09 10:16:29 +08:00
Jhao-Ting Chen
0a09465089
[ https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model ( #8383 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
Ludwig Schneider
41ce14ab04
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce ( #9314 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2025-12-07 09:43:26 -08:00
Enwei Zhu
7cd5a67e25
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP ( #9592 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2025-12-05 22:08:52 -08:00
QI JUN
0915c4e3a1
[TRTLLM-9086][doc] Clean up TODOs in documentation ( #9292 )
...
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
Iman Tabrizian
9425f7fe3a
[ https://nvbugs/5601682 ][fix] Fix cacheTransceiver hang ( #9311 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2025-12-05 17:50:12 -05:00
zackyoray
398d24232d
[None][feat] Add NIXL-LIBFABRIC support ( #9225 )
...
Signed-off-by: Yoray Zack <62789610+zackyoray@users.noreply.github.com>
Signed-off-by: zackyoray <yorayz@nvidia.com>
2025-12-04 15:38:06 +08:00