TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-04 10:11:47 +08:00

Author	SHA1	Message	Date
Chenghao Zhang	0b748d5bba	[None][chore] update flashinfer to 0.6.0 (#10522 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 16:22:06 -05:00
Kyungmin Lee	25148d3fee	[None][feat] Support new Transformers RoPE configuration format (#10636 ) Signed-off-by: lkm2835 <lkm2835@gmail.com>	2026-01-14 19:41:27 +09:00
Kaiyu Xie	1c69aad850	[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-09 09:50:57 -05:00
Chang Liu	78bb245554	[https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 (#10552 )	2026-01-09 00:49:39 -08:00
Fanrong Li	a34aa63685	[https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 (#10449 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-07 12:29:51 +08:00
Kaiyu Xie	2eaabd7461	[None] [fix] Fix undefined tokens_per_block (#10438 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-06 02:42:37 -05:00
Fanrong Li	4931c5eb3a	[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 16:43:42 +08:00
Fanrong Li	b5a1e10bc0	[https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata (#10393 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 09:43:44 +08:00
Jin Li	c04563657e	[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-27 00:07:20 +08:00
Xianjie Qiao	53b81783b1	[None][fix] Fix pageable H2D memcopy issue on GB200 (#10289 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2025-12-25 18:15:57 +08:00
Fanrong Li	156f6453dc	[TRTLLM-9798][feat] Change to use new DeepGEMM MQA sm100 kernel for MTP-3 (#10226 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-24 14:39:12 +08:00
Balaram Buddharaju	5266475014	[None][feat] Cudagraph updates for helix parallelism (#10141 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 15:21:52 -05:00
Bo Li	a66eeab537	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-12-21 02:52:42 -05:00
Yechan Kim	8ba8699f66	[TRTLLM-8310][feat] Add Qwen3-VL-MoE (#9689 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-12-15 20:05:20 -08:00
Fanrong Li	8f144d9282	[TRTLLM-9416][feat] Skip DS-v3.2 indexer MQA and Top-K for short sequences. (#9524 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-15 12:42:25 +08:00
Balaram Buddharaju	6a6e41f802	[TRTLLM-9468][chore] Update disagg benchmarking scripts to support context parallelism (#9720 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-12 22:29:41 -08:00
Jhao-Ting Chen	da074be037	[None][fix] Fix #8383 introduced TRTLLM backend python error (#9804 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 13:31:37 -08:00
Jhao-Ting Chen	0a09465089	[https://nvbugs/5567586 ][feat] Ampere xqa swa specdec for GPT-OSS Eagle3-one-model (#8383 ) Signed-off-by: Jhao-Ting Chen <jhaotingc@nvidia.com>	2025-12-08 11:16:05 -08:00
Iman Tabrizian	356a52edf5	[None][feat] Add support for KVCache reuse for DSv32 (#9383 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 11:14:30 +08:00
heyuhhh	6e470aab72	[None] [feat] Optimize the algorithm part of RocketKV (#9333 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-01 09:04:09 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
Chang Liu	b10137fdd5	[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-26 16:38:25 +08:00
JunyiXu-nv	b7308a4000	[https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 (#9032 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-26 13:09:06 +08:00
YueWeng	cc336c4abd	[TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-11-25 09:40:55 -05:00
Yukun He	39076410a8	[https://nvbugs/5676748 ][fix] Fix mismatched nvfp4 gemm sf shape. (#9336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-24 12:16:32 +08:00
Chang Liu	79a6c9742b	[None][fix] Use fp32 for indexer weight_proj GEMM (#9243 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-19 21:52:38 -08:00
ChristinaZ	941a54c66a	[None][feat] Update the indexer topK (#9255 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-19 11:49:00 +08:00
Ziyi Xiong	7c4344b92e	[https://nvbugs/5590408 ][fix] Exclude num of draft tokens from mMaxSeqLenKv (#9210 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-18 15:41:56 -05:00
Chang Liu	8e001dd195	[None][fix] DeepSeek V3.2 indexer RoPE fix (#9232 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-18 20:35:27 +08:00
sunnyqgg	7862b15a65	[TRTLLM-8778][feat] Add tree attention support for blackwell arch (#8975 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-11-17 09:01:53 +08:00
heyuhhh	f07e9977c6	[None] [feat] Use triton kernels for RocketKV prediction module (#8682 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-11-13 18:51:09 -08:00
Chang Liu	c37924f37b	[None][fix] Clear indexer k cache reference before release cuda memory (#9110 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-12 22:12:53 -08:00
Zhang Ge	49df731b96	[#6507 ][fix] Fix precision issue due to KV layout mismatch for split/concat kernels (#6917 ) Signed-off-by: ZhangGe6 <sjtu.zg123@gmail.com> Co-authored-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-13 12:14:58 +08:00
Fanrong Li	780d4f9dc5	[None][feat] Add MTP>1 support for DS-v3.2 (#9045 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-12 09:56:12 -08:00
Chang Liu	7ceb5e5ab6	[TRTLLM-9198][perf] Add torch.compile + multi-stream support for k-cache scatter and weight scaling (#8988 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Co-authored-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-11 12:33:30 +08:00
ChristinaZ	2e7769d1e8	[None][feat] Add customized topk and related unit tests for DSA (#8882 ) Signed-off-by: Christina Zhang <83400082+ChristinaZ@users.noreply.github.com>	2025-11-10 03:35:35 -08:00
bhsueh_NV	e8d4a56dd0	[None][fix] fix eagle3 accuracy issue on sm120 (#8944 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2025-11-10 14:02:03 +08:00
Chang Liu	7081f254cf	[None][perf] Add custom indexer k cache scatter op (#8960 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-07 11:24:26 -08:00
yunruis	51545560da	[TRTLLM-8803][feat] Add rope and uk-bgemm overlap for mla generation (#8495 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com>	2025-11-06 17:39:57 +08:00
Chang Liu	e57d83c5dc	[TRTLLM-8768][chore] Fuse QK down_proj with indexer K + weight_proj for FP4 ckpt (#8771 )	2025-11-05 07:57:09 -08:00
HuiGao-NV	97674c3114	[TRTLLM-8690][feat] add more tensors to share buffers (#8691 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-11-03 21:08:01 -08:00
Fanrong Li	f0dc746738	[TRTLLM-8541][feat] Add trtllm-gen sparse MLA kernels to support per-Tensor FP8 KV Cache (#8692 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com> Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-31 14:38:31 -07:00
Iman Tabrizian	ae6875fe10	[TRTLLM-8976][feat] Move indexer-k-cache to KVCacheManager (#8699 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-10-29 08:04:26 -07:00
Chang Liu	5f737b8dbe	[None][perf] Use fp8 quant kernel in DS3.2 indexer module (#8701 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-10-29 12:45:09 +08:00
Chang Liu	e47c787dd7	[TRTLLM-8535][feat] Support DeepSeek V3.2 with FP8 + BF16 KV cache/NVFP4 + BF16 KV cache (#8405 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Signed-off-by: Tracin <10434017+Tracin@users.noreply.github.com>	2025-10-24 13:40:41 -04:00
sunnyqgg	ea3e0eea51	[TRTLLM-7954][feat] Target model KV cache rellocation (#8421 ) Signed-off-by: qgai <qgai@nvidia.com>	2025-10-23 09:36:50 +08:00
Fanrong Li	0d20a8fd61	[TRTLLM-8536][feat] Add the sparse attention framework and one use case--RocketKV support (#8086 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com> Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com> Co-authored-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-10-14 08:23:16 -07:00
HuiGao-NV	f4d3be4bbc	[None][feat] Add a standalone buffer cache class and reuse buffers between cduagraph and no-graph flow (#7669 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2025-09-26 07:28:06 -07:00
Yechan Kim	f77aca9f2c	[TRTLLM-7385][feat] Optimize Qwen2/2.5-VL performance (#7250 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2025-09-22 03:40:02 -07:00
Bo Li	a15f08db3d	[https://nvbugs/5467548 ][fix] DeepSeek illegal memory access. (#7298 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2025-09-22 14:28:38 +08:00

1 2 3

122 Commits