Bo Deng
|
9e7b50aefb
|
[TRTLLM-9752][fix] WAR: Disable PDL for quant kernels to fix accuracy issues (#10285)
Signed-off-by: Bo Deng <deemod@nvidia.com>
|
2026-01-03 14:34:55 +08:00 |
|
Yueh-Ting (eop) Chen
|
9cee32ab39
|
[https://nvbugs/5625990][fix] Respect VSWA scheme when doing block store for reuse and load block for reuse in KV cache manager (#10183)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
|
2025-12-29 14:29:14 +08:00 |
|
Guoming Zhang
|
93ac0bc1dc
|
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… (#10229)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
|
2025-12-27 22:48:10 +08:00 |
|
Jin Li
|
c04563657e
|
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-27 00:07:20 +08:00 |
|
Jin Li
|
7e4cef9def
|
[None][fix] Cherry-pick conflict changes for PR 7999 PR 8515 (#9446)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
|
2025-12-25 10:23:04 -05:00 |
|
Zhenhuan Chen
|
8462cf6c96
|
[TRTLLM-9578][feat] make PDL enabled by default (#9695)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
|
2025-12-25 07:15:24 -05:00 |
|
Gabriel Wu
|
1d01214ff0
|
[None][feat] Drop non-deepgemm fp8 block scale gemm (#10256)
Signed-off-by: Zihua Wu <13583761+lucifer1004@users.noreply.github.com>
|
2025-12-25 14:52:52 +08:00 |
|
Jonas Li
|
ecea71ca7a
|
[None][chore] Update tinygemm kernel name (#10248)
Signed-off-by: Jonas Li <6110159+longlee0622@users.noreply.github.com>
|
2025-12-24 02:33:25 -05:00 |
|
Balaram Buddharaju
|
8c1cfc872b
|
[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-23 18:14:30 -08:00 |
|
Roey Azran
|
8408c40d8b
|
[https://nvbugs/5702786][fix] Fix race conditions in KV cache communication during unexpected termination (#10076)
Signed-off-by: roeya <165803633+RoeyAzran1992@users.noreply.github.com>
|
2025-12-23 14:09:51 +02:00 |
|
Shiyu Li
|
3ddc9d2b48
|
[https://nvbugs/5729697][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062)
Signed-off-by: Shiyu Li <shili@nvidia.com>
|
2025-12-23 16:07:07 +08:00 |
|
Bo Li
|
cc1323be24
|
[None][fix] Fix the bug for top_k=10 in NVLinkOneSided AlltoAll. (#10197)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-12-23 02:13:37 -05:00 |
|
Bo Li
|
472fe497dc
|
[None][chore] NVLinkOneSided AlltoAll Support zero local_num_tokens. (#9822)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-12-22 05:57:12 -05:00 |
|
Perkz Zheng
|
c87f1a6b39
|
[https://nvbugs/5503479][fix] update trtllm-gen kernels to address few bugs (#10089)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-12-22 04:45:33 -05:00 |
|
Bo Li
|
a66eeab537
|
[TRTLLM-9805][feat] Skip Softmax Attention. (#9821)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
|
2025-12-21 02:52:42 -05:00 |
|
Enwei Zhu
|
2ce785f39a
|
[https://nvbugs/5643631][fix] Fix hostfunc seg fault (#10028)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-12-20 07:58:43 -05:00 |
|
longcheng-nv
|
b882393d69
|
[https://nvbugs/5720357][fix] Fix indice offset overflow in custom Top-K kernel and corresponding UT case (#10027)
Signed-off-by: longcheng-nv <243710427+longcheng-nv@users.noreply.github.com>
Co-authored-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
|
2025-12-19 14:58:01 -05:00 |
|
Wangjue Yao
|
9f283f330b
|
[None][feat] Support Mooncake transfer engine as a cache transceiver backend (#8309)
Signed-off-by: wjueyao <wyao123@terpmail.umd.edu>
Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co>
|
2025-12-19 10:09:51 +08:00 |
|
Chuang Zhu
|
e0b2a94309
|
[None][fix] Fix ready signal in NIXL backend (#10000)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-12-19 09:43:40 +08:00 |
|
Enwei Zhu
|
6fe89ea00f
|
[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output (#9840)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-12-18 10:36:38 -08:00 |
|
CarstyYou
|
0b279f4ad4
|
[https://nvbugs/5456493][feat] Add fp8 bmm on sm120 (#9687)
Signed-off-by: CarstyYou <186021327+CarstyYou@users.noreply.github.com>
|
2025-12-18 22:57:20 +08:00 |
|
Nikita Korobov
|
3b4f26e4d1
|
[None][feat] update TRT-LLM Gen MoE for NvFp4 + bias with tileN=256 (#9734)
Signed-off-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-12-18 11:58:23 +01:00 |
|
Perkz Zheng
|
064b67e40c
|
[https://nvbugs/5727952][fix] a pdl bug in trtllm-gen fmha kernels (#9913)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-12-16 00:34:37 -08:00 |
|
Yihan Wang
|
6b5ebaae3e
|
[None][chore] Update internal_cutlass_kernels artifacts (#9992)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
|
2025-12-15 21:15:25 -08:00 |
|
ChristinaZ
|
dff77efa2a
|
[None][feat] Add routing support for the new model for both cutlass and trtllm moe backend (#9792)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-12-15 19:59:08 -08:00 |
|
Anthony Chang
|
ad12b795c9
|
[https://nvbugs/5661741][fix] Fix accuracy issue in TRTLLM MoE introduced in #9377 (#9999)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-12-15 03:31:56 -08:00 |
|
Void
|
dda7658306
|
[https://nvbugs/5655885][fix] fix invalid instruction error in 2shot ar kernel on Ampere (#9394)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
|
2025-12-15 14:22:56 +08:00 |
|
Yuxian Qiu
|
7588029763
|
[None][feat] Async pp send for PPCommTorch. (#9976)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
|
2025-12-15 14:03:46 +08:00 |
|
Anthony Chang
|
3be5f3abcf
|
[None][fix] Fix regex pattern for cubin filtering (#9914)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
|
2025-12-15 10:02:48 +08:00 |
|
Simeng Liu
|
f21e2b3329
|
[TRTLLM-9601][feat] Expose mmKeys for multimodal to integrate with dynamo. (#9604)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
|
2025-12-15 08:42:30 +08:00 |
|
Balaram Buddharaju
|
9a1750c8f9
|
[TRTLLM-9493][noop] Refactor fusedMoeCommKernels to enable code sharing (#9922)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-14 11:29:30 -08:00 |
|
nvxuanyuc
|
a5a37227d6
|
[None][feat] Fused kernels (qknormrope + moe routing) and two-model MTP support for glm4moe (#9852)
Signed-off-by: Xuanyu Chen <xuanyuc@nvidia.com>
|
2025-12-14 10:47:24 +08:00 |
|
Faraz
|
64d7796234
|
[None][chore] Add namespace to header to fix tot failure (#9973)
|
2025-12-13 12:18:10 -05:00 |
|
Faraz
|
98d72c7648
|
[None][feat] spark cublas LUT table for llama-8b-bf16 perf (#9811)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
|
2025-12-12 22:37:56 -05:00 |
|
Balaram Buddharaju
|
461446045e
|
[TRTLLM-9493][feat] Add helixPostProcessNative kernel for cp_dim=2 (#9924)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
|
2025-12-12 16:49:25 -08:00 |
|
tburt-nv
|
6147452158
|
[https://nvbugs/4141427][chore] Add more details to LICENSE file (#9881)
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
|
2025-12-13 08:35:31 +08:00 |
|
Chuang Zhu
|
4cc4cbe926
|
[https://nvbugs/5716787][fix] terminate nixl running when exiting (#9785)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
|
2025-12-12 11:15:02 -05:00 |
|
Chuang Zhu
|
9c59c9f920
|
[https://nvbugs/5643787][fix] remove the war path for notify to itself (#9834)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
|
2025-12-12 11:10:05 -05:00 |
|
Yihan Wang
|
9df4dad3b6
|
[None][fix] Introduce inline namespace to avoid symbol collision (#9541)
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
|
2025-12-12 23:32:15 +08:00 |
|
Yukun He
|
a6263a127f
|
[None][chore] Degrade log level in cublas fp4 runner when using default configs (#9951)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
|
2025-12-12 18:53:54 +08:00 |
|
ChristinaZ
|
b8a5159fad
|
[None][feat] Enable PDL for indexer topK (#9843)
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
|
2025-12-11 14:31:23 +08:00 |
|
Brian K. Ryu
|
8cec2da375
|
[None][feat] Port fp4 quantization kernel optimization from FlashInfer (#9854)
Signed-off-by: Brian Ryu <bryu@nvidia.com>
Co-authored-by: Nikita Korobov <14355239+nekorobov@users.noreply.github.com>
|
2025-12-10 13:13:48 +01:00 |
|
Perkz Zheng
|
e34302986d
|
[https://nvbugs/5727952][fix] PDL bugs with trtllm-gen fmha kernels (#9863)
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
|
2025-12-10 01:47:03 -08:00 |
|
Bo Li
|
9d3c675a0b
|
[None][chore] Support larger topK for NVLinkOneSided AlltoAll. (#9816)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
|
2025-12-10 11:10:55 +08:00 |
|
Jiagan Cheng
|
4a3a66b124
|
[https://nvbugs/5677746][fix] Use first PP rank's schedule result in other PP ranks to fix PP hang (#9659)
Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>
|
2025-12-08 18:43:52 -08:00 |
|
Tri Dao
|
1c4dacb19a
|
[None][fix] Fix PDL in TRTLLM MOE for dsv3 (#9799)
Signed-off-by: Tri Dao <daominhtri0503@gmail.com>
|
2025-12-09 10:16:29 +08:00 |
|
Jhao-Ting Chen
|
0a09465089
|
[https://nvbugs/5567586][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383)
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
|
2025-12-08 11:16:05 -08:00 |
|
Ludwig Schneider
|
41ce14ab04
|
[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314)
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
|
2025-12-07 09:43:26 -08:00 |
|
Enwei Zhu
|
7cd5a67e25
|
[TRTLLM-9372][feat] Enable CuteDSL MoE with Large EP (#9592)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
|
2025-12-05 22:08:52 -08:00 |
|
QI JUN
|
0915c4e3a1
|
[TRTLLM-9086][doc] Clean up TODOs in documentation (#9292)
Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Mike Iovine <miovine@nvidia.com>
|
2025-12-05 17:50:12 -05:00 |
|