Chenghao Zhang
0b748d5bba
[None][chore] update flashinfer to 0.6.0 ( #10522 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 16:22:06 -05:00
Kyungmin Lee
25148d3fee
[None][feat] Support new Transformers RoPE configuration format ( #10636 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-14 19:41:27 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA ( #10571 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Chang Liu
78bb245554
[ https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 ( #10552 )
2026-01-09 00:49:39 -08:00
Fanrong Li
a34aa63685
[ https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 ( #10449 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
Kaiyu Xie
2eaabd7461
[None] [fix] Fix undefined tokens_per_block ( #10438 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 02:42:37 -05:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch ( #9898 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Fanrong Li
b5a1e10bc0
[ https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata ( #10393 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 09:43:44 +08:00
Jin Li
c04563657e
[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile ( #9740 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2025-12-27 00:07:20 +08:00
Xianjie Qiao
53b81783b1
[None][fix] Fix pageable H2D memcopy issue on GB200 ( #10289 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2025-12-25 18:15:57 +08:00
Fanrong Li
156f6453dc
[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 ( #10226 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-24 14:39:12 +08:00
Balaram Buddharaju
5266475014
[None][feat] Cudagraph updates for helix parallelism ( #10141 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-21 15:21:52 -05:00
Bo Li
a66eeab537
[TRTLLM-9805][feat] Skip Softmax Attention. ( #9821 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2025-12-21 02:52:42 -05:00
Yechan Kim
8ba8699f66
[TRTLLM-8310][feat] Add Qwen3-VL-MoE ( #9689 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-12-15 20:05:20 -08:00
Fanrong Li
8f144d9282
[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. ( #9524 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-12-15 12:42:25 +08:00
Balaram Buddharaju
6a6e41f802
[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism ( #9720 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-12-12 22:29:41 -08:00
Jhao-Ting Chen
da074be037
[None][fix] Fix #8383 introduced TRTLLM backend python error ( #9804 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 13:31:37 -08:00
Jhao-Ting Chen
0a09465089
[ https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model ( #8383 )
...
Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>
2025-12-08 11:16:05 -08:00
Iman Tabrizian
356a52edf5
[None][feat] Add support for KVCache reuse for DSv32 ( #9383 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-12-02 11:14:30 +08:00
heyuhhh
6e470aab72
[None] [feat] Optimize the algorithm part of RocketKV ( #9333 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-12-01 09:04:09 +08:00
brb-nv
b77f4ffe54
[TRTLLM-5971][feat] Integrate helix parallelism ( #9342 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2025-11-29 15:17:30 -08:00
Chang Liu
b10137fdd5
[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model ( #9376 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-26 16:38:25 +08:00
JunyiXu-nv
b7308a4000
[ https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 ( #9032 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2025-11-26 13:09:06 +08:00
YueWeng
cc336c4abd
[TRTLLM-8160][feat] Add draft token tree runtime on CDL ( #8586 )
...
Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>
2025-11-25 09:40:55 -05:00
Yukun He
39076410a8
[ https://nvbugs/5676748 ][fix] Fix mismatched nvfp4 gemm sf shape. ( #9336 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2025-11-24 12:16:32 +08:00
Chang Liu
79a6c9742b
[None][fix] Use fp32 for indexer weight_proj GEMM ( #9243 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-19 21:52:38 -08:00
ChristinaZ
941a54c66a
[None][feat] Update the indexer topK ( #9255 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-19 11:49:00 +08:00
Ziyi Xiong
7c4344b92e
[ https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv ( #9210 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2025-11-18 15:41:56 -05:00
Chang Liu
8e001dd195
[None][fix] DeepSeek V3.2 indexer RoPE fix ( #9232 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-18 20:35:27 +08:00
sunnyqgg
7862b15a65
[TRTLLM-8778][feat] Add tree attention support for blackwell arch ( #8975 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-11-17 09:01:53 +08:00
heyuhhh
f07e9977c6
[None] [feat] Use triton kernels for RocketKV prediction module ( #8682 )
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-11-13 18:51:09 -08:00
Chang Liu
c37924f37b
[None][fix] Clear indexer k cache reference before release cuda memory ( #9110 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-12 22:12:53 -08:00
Zhang Ge
49df731b96
[ #6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels ( #6917 )
...
Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com>
Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2025-11-13 12:14:58 +08:00
Fanrong Li
780d4f9dc5
[None][feat] Add MTP>1 support for DS-v3.2 ( #9045 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-12 09:56:12 -08:00
Chang Liu
7ceb5e5ab6
[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling ( #8988 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2025-11-11 12:33:30 +08:00
ChristinaZ
2e7769d1e8
[None][feat] Add customized topk and related unit tests for DSA ( #8882 )
...
Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>
2025-11-10 03:35:35 -08:00
bhsueh_NV
e8d4a56dd0
[None][fix] fix eagle3 accuracy issue on sm120 ( #8944 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2025-11-10 14:02:03 +08:00
Chang Liu
7081f254cf
[None][perf] Add custom indexer k cache scatter op ( #8960 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-11-07 11:24:26 -08:00
yunruis
51545560da
[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation ( #8495 )
...
Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>
2025-11-06 17:39:57 +08:00
Chang Liu
e57d83c5dc
[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt ( #8771 )
2025-11-05 07:57:09 -08:00
HuiGao-NV
97674c3114
[TRTLLM-8690][feat] add more tensors to share buffers ( #8691 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-11-03 21:08:01 -08:00
Fanrong Li
f0dc746738
[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache ( #8692 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-31 14:38:31 -07:00
Iman Tabrizian
ae6875fe10
[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager ( #8699 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2025-10-29 08:04:26 -07:00
Chang Liu
5f737b8dbe
[None][perf] Use fp8 quant kernel in DS3.2 indexer module ( #8701 )
...
Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>
2025-10-29 12:45:09 +08:00
Chang Liu
e47c787dd7
[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache ( #8405 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>
2025-10-24 13:40:41 -04:00
sunnyqgg
ea3e0eea51
[TRTLLM-7954][feat] Target model KV cache rellocation ( #8421 )
...
Signed-off-by: qgai <qgai@nvidia.com>
2025-10-23 09:36:50 +08:00
Fanrong Li
0d20a8fd61
[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support ( #8086 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2025-10-14 08:23:16 -07:00
HuiGao-NV
f4d3be4bbc
[None][feat] Add a standalone buffer cache class and reuse buffers between cduagraph and no-graph flow ( #7669 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2025-09-26 07:28:06 -07:00
Yechan Kim
f77aca9f2c
[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance ( #7250 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2025-09-22 03:40:02 -07:00
Bo Li
a15f08db3d
[ https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. ( #7298 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2025-09-22 14:28:38 +08:00