TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-16 07:53:55 +08:00

Author	SHA1	Message	Date
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Enwei Zhu	ffab217974	[None][fix] Fix CuteDSL MoE unittest (#10983 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-26 08:34:17 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
William Zhang	2146c23786	[#9306 ][refactor] Refactor AutoDeployConfig into LlmArgs (#10613 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski	d8e6e22060	[https://nvbugs/5819002 ][fix] fix sharding tests (#10775 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-22 20:02:48 +01:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
shuyixiong	fd2af8d58a	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-22 14:46:05 +08:00
Yukun He	bf7303c7f1	[https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 17:25:40 +08:00
Yibin Li	9116dfbacd	[https://nvbugs/5775021 ] [fix] Replace pickle.load with restricted Unpickler (#10622 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2026-01-21 11:42:54 +08:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Yi Zhang	58311b2345	[None][fix] Remove unused params in attn (#10652 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-01-20 03:08:59 -05:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Liao Lanyu	dbb858ae0c	[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-01-20 10:31:13 +08:00
Lucas Liebenwein	9879400479	[#10642 ][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] (#10675 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:42:30 -05:00
Eran Geva	4d2916d683	[#10688 ][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size (#10687 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 13:31:01 -05:00
Eran Geva	a11f0dbd61	[#10696 ][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 (#10697 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 10:42:49 +02:00
Grzegorz Kwasniewski	7bf4dd9f63	[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>	2026-01-17 04:02:06 -05:00
Yukun He	3d16daf696	[None][fix] Fix tmp dir being deleted too early in unit test. (#10740 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-17 13:49:10 +08:00
Frida Hou	069ad68d3c	[None][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-16 16:24:37 -05:00
Chenghao Zhang	b6acd96616	[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 12:04:40 -08:00
Stefan Niebler	0cfd08745c	[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2026-01-16 10:52:41 -08:00
xxi	ce561b6a8e	[TRTLLM-9111][feat] MoE test refactor: Extend MoE quantization test utilities with comprehensive quant algorithm support (#10691 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-16 10:26:33 +08:00
heyuhhh	e3f27e06c7	[None][chore] Waive star attention unittests (#10439 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2026-01-16 10:12:32 +08:00
Perkz Zheng	71ccc07d2b	[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-15 02:24:25 -05:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Lucas Liebenwein	15b43e8a14	[https://nvbugs/5777041 ][fix] fix AutoDeploy ep sharding test (#10460 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 21:53:56 -05:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
Leslie Fang	bc119f5644	[None][chore] Add test configurable moe module (#10575 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 07:25:57 +08:00
Frida Hou	bf16fbd86c	[#9283 ][feat] AutoDeploy: separate rms pattern detection from fusion (#9969 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-13 14:57:27 -05:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
mpikulski	bf7998f1b8	[TRTLLM-9522][test] cover LLM API `multi_modal_embeddings` (#9963 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-12 11:38:22 +01:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Chenghao Zhang	38f249b479	[https://nvbugs/5548861 ][fix] AutoDeploy: Fix the test (#10521 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-09 13:30:24 -08:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
William Zhang	c0ae6bbdbe	[None][feat] EPD for Qwen3 VL (#10470 ) * Why? We would like to support EPD disaggregated serving for Qwen3 VL. * What? This commit adds such support, and extends existing unit tests for correctness checks. Some minor (protected) interface changes had to be made to the weight mapper as a side-effect. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-08 06:45:54 -05:00
Lucas Liebenwein	30f8455d29	[https://nvbugs/5747878 ][fix] unwaive llama4 scout tests (#10468 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-07 23:33:45 -05:00
Yuxian Qiu	b85c447ceb	[https://nvbugs/5784543 ][fix] Setup dist before using autotuner. (#10491 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-08 10:32:50 +08:00
Lucas Liebenwein	d736c7f290	[https://nvbugs/5761665 ][fix] AutoDeploy: handle bugs for 25.12 dlfw upgrade (#10511 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-07 20:16:53 -05:00
Lucas Liebenwein	6095c80e56	[https://nvbugs/5721907 ][fix] AutoDeploy: improve numerical stability of flashinfer attention test (#10467 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-06 21:11:06 -05:00
Lucas Liebenwein	bb6a3973aa	[https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes (#10466 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-06 19:55:49 -05:00
Mike Iovine	77be1b7572	[https://nvbugs/5749988 ][fix] Remove redundant qwen3 spec dec test (#10387 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-06 11:46:34 -05:00
alel	6b8ae6fa81	[None][feat] CuteDSL MOE FC1 Enhancement (#10088 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2026-01-06 09:30:43 +08:00
Anthony Chang	225d3a9001	[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2026-01-05 17:16:12 +01:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
Yukun He	0937df2c68	[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 13:44:09 +08:00

1 2 3 4 5 ...

753 Commits