TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 18:51:38 +08:00

Author	SHA1	Message	Date
Rundong Li	f1b85fea4c	[None][feat] Integrate cuda.tile RMS norm kernels (#9725 ) Signed-off-by: Rundong (David) Li <davidli@nvidia.com> Co-authored-by: Jinman Xie <jinmanx@nvidia.com> Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com> Co-authored-by: Qiqi Xiao <qiqix@nvidia.com> Co-authored-by: Biao Wang <biaow@nvidia.com> Co-authored-by: Thomas Schmid <thschmid@nvidia.com>	2026-02-02 19:44:27 +08:00
Michal Guzek	fafc22e3d4	[https://nvbugs/5691730 ][fix] Have LoRa bf16 ckpts work with Llama 3.3-70B-fp8 (#9808 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-02-02 16:26:46 +08:00
Guoming Zhang	6bace84167	[TRTLLM-10398][feat] Enable TRTLLM moe backend for Nemotron Super (#10791 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-31 13:48:25 +08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
Enwei Zhu	34a730aaf7	[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-29 12:11:07 +08:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00
Wanli Jiang	4a206351bb	[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-27 13:04:40 +08:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Yanchao Lu	ae58a7ed20	[None][chore] Revert NVIDIA/TensorRT-LLM#10819 (#10870 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yanchao Lu	18f63dfcec	[None][chore] Reduce tedious logs (#10819 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
彭晋韬(jtao peng)	9beb971827	[None][fix] Update RMSNorm custom op plumbing (#10843 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-22 21:03:22 +08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
shuyixiong	fd2af8d58a	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-22 14:46:05 +08:00
xxi	9feebb3a27	[None][chore] switch to ConfigurableMoE as the default path (#10792 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-21 15:57:38 +08:00
Yanchao Lu	ccf4d79c6c	[None][chore] Revert NVIDIA/TensorRT-LLM#10847 (#10869 )	2026-01-21 11:08:40 +08:00
Yanchao Lu	ae8f74b620	[None][chore] Reduce tedious logs (#10847 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-20 22:56:24 +08:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Bo Li	f3a985ce27	[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. (#10539 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-20 11:08:04 +08:00
Void	f7de285a82	[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-14 22:15:29 -05:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
Emma Qiao	01083b56bf	[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: xxi <xxi@nvidia.com> Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: xxi <xxi@nvidia.com> Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-14 21:54:04 +08:00
xxi	d8862505b9	[None][chore] enable EPLB for DEEPGEMM (#10617 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:28:08 -05:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
Leslie Fang	795e690bca	[https://nvbugs/5753788 ][chore] Padding empty chunk for configurable moe (#10451 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 10:42:17 +08:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
Xianjie Qiao	3a9a00b544	[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2026-01-12 14:10:31 +08:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Kaiyu Xie	1c69aad850	[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-09 09:50:57 -05:00
dongfengy	afc533193d	[None][feat] Support nvfp4 for gptoss (#8956 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-04 08:57:44 -05:00
Jin Li	ef1d4a40b5	[https://nvbugs/5727475 ][fix] Avoid use property with setter in nn.Mo… (#10212 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-31 06:21:36 -05:00
Jin Li	c04563657e	[TRTLLM-7735][feat] Attention NVFP4 out support for torch compile (#9740 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-27 00:07:20 +08:00
Wanli Jiang	14554ab3f3	[None][feat] Support multi-gpu running for nemotron-v3-nano and super (#10118 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-26 11:23:14 +08:00
ZhichenJiang	46e4af5688	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 09:04:20 -05:00
Balaram Buddharaju	8c1cfc872b	[TRTLLM-9493][feat] Custom AllToAll for helix parallelism (#9986 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-23 18:14:30 -08:00
Tailing Yuan	648196f8ae	[TRTLLM-9432][feat] Reduce synchronization and recompilation for qwen3-next (#9691 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-23 10:14:29 +08:00
Faraz	f05af48bca	[https://nvbugs/5747674 ][fix] Add contiguous() before view() in load_expert_w3_w1_weight and load (#10136 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2025-12-22 21:03:34 -05:00
JadoTu	7421224d69	[None][fix] NVFP4 linear method's weight and weight_scale padding (#10148 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2025-12-22 15:00:31 +08:00
Balaram Buddharaju	5266475014	[None][feat] Cudagraph updates for helix parallelism (#10141 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-21 15:21:52 -05:00
shuyixiong	4fc6036276	[https://nvbugs/5702793 ][fix] Fix view operation on uncontiguous tensor (#10147 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>	2025-12-21 11:47:20 -05:00
xxi	5ae154022a	[TRTLLM-9872][fix] clear the failed test at CI when enalbe_configurab… (#10067 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-12-21 08:14:50 -05:00
Bo Li	a66eeab537	[TRTLLM-9805][feat] Skip Softmax Attention. (#9821 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com> Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2025-12-21 02:52:42 -05:00
Enwei Zhu	21a93fbf9d	[TRTLLM-9992][perf] Enable PDL for CuteDSL kernels and overlap MoeOutputMemset (#10043 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-20 03:12:41 -05:00
Enwei Zhu	6fe89ea00f	[TRTLLM-9819][perf] Reuse alltoall workspace for CuteDSL MoE output (#9840 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-18 10:36:38 -08:00

1 2 3 4 5 ...

395 Commits