TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-06 03:01:50 +08:00

Author	SHA1	Message	Date
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Pengyun Lin	ce37e27066	[#10614 ][fix] gpt_oss first iteration streaming in trtllm-serve (#10808 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-26 20:53:11 +08:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Enwei Zhu	ffab217974	[None][fix] Fix CuteDSL MoE unittest (#10983 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-26 08:34:17 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Leslie Fang	31d04dfa12	[TRTLLM-9108][feat] Add test configurable moe module multi gpu (#10699 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-23 10:16:58 +08:00
William Zhang	2146c23786	[#9306 ][refactor] Refactor AutoDeployConfig into LlmArgs (#10613 ) Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-22 16:02:49 -05:00
Grzegorz Kwasniewski	d8e6e22060	[https://nvbugs/5819002 ][fix] fix sharding tests (#10775 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-22 20:02:48 +01:00
Shi Xiaowei	944c304bbb	[TRTLLM-9527][feat] Python transceiver components (step 2) (#10494 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2026-01-22 10:14:50 -08:00
Venky	b3146d095d	[TRTC-122][feat] Eagle3 Specdec UX improvements (#10124 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2026-01-22 07:24:11 -08:00
Yan Chunwei	30ffa58b54	[https://nvbugs/5783876 ][fix] fix hmac launch (#10434 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com>	2026-01-22 23:20:53 +08:00
Pengyun Lin	5e34112b27	[TRTLLM-10388][feat] Support logprobs for Completions API (#10809 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-22 21:25:24 +08:00
Jiayu Chang	1dc49b266e	[https://nvbugs/5322131 ][feat] Multi-LoRA serving with CUDA Graph (#8279 ) Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>	2026-01-22 14:01:18 +01:00
tcherckez-nvidia	128d4ac5be	[None][chore] NVFP4 MoE - Move weights transformation to fusion phase… (#10803 ) Signed-off-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Signed-off-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster> Signed-off-by: tcherckez-nvidia <127761168+tcherckez-nvidia@users.noreply.github.com> Co-authored-by: Tal Cherckez <tcherckez@nvl72070-T11.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72039-T03.cm.cluster> Co-authored-by: Tal Cherckez <tcherckez@nvl72098-T11.cm.cluster>	2026-01-22 13:08:05 +02:00
shuyixiong	fd2af8d58a	[TRTLLM-9771][feat] Support partial update weight for fp8 (#10456 ) Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com> Signed-off-by: shuyixiong <219646547+shuyixiong@users.noreply.github.com>	2026-01-22 14:46:05 +08:00
Enwei Zhu	be4a431ffd	[TRTLLM-10154][feat] Enable guided decoding with reasoning parsers (#10890 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-22 14:14:28 +08:00
Taylor Yeonbok Lee	895bb94b3d	[#8241 ][feat] Support model_kwargs for pytorch backend (#10351 ) Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>	2026-01-21 20:51:38 -08:00
Lizhi Zhou	f3a41c8d94	[TRTLLM-10059][feat] Use global unique id as disagg request id (#10187 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-21 22:52:34 -05:00
Yukun He	bf7303c7f1	[https://nvbugs/5636916 ][fix] Cherry-pick #10654 : Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-21 17:25:40 +08:00
Yibin Li	9116dfbacd	[https://nvbugs/5775021 ] [fix] Replace pickle.load with restricted Unpickler (#10622 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2026-01-21 11:42:54 +08:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Yi Zhang	58311b2345	[None][fix] Remove unused params in attn (#10652 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-01-20 03:08:59 -05:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Liao Lanyu	dbb858ae0c	[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-01-20 10:31:13 +08:00
Zhanrui Sun	df845a028b	[TRTLLM-9581][infra] Use /home/scratch.trt_llm_data_ci in computelab (#10616 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com> Signed-off-by: Zhanrui Sun <184402041+ZhanruiSunCh@users.noreply.github.com>	2026-01-19 00:40:40 -05:00
Lucas Liebenwein	9879400479	[#10642 ][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] (#10675 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:42:30 -05:00
Eran Geva	4d2916d683	[#10688 ][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size (#10687 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 13:31:01 -05:00
Eran Geva	a11f0dbd61	[#10696 ][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 (#10697 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 10:42:49 +02:00
Grzegorz Kwasniewski	7bf4dd9f63	[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>	2026-01-17 04:02:06 -05:00
Yukun He	3d16daf696	[None][fix] Fix tmp dir being deleted too early in unit test. (#10740 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-17 13:49:10 +08:00
Frida Hou	069ad68d3c	[None][fix] AutoDeploy: skip mxfp4_moe test unless on Hopper (#10729 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-16 16:24:37 -05:00
Chenghao Zhang	b6acd96616	[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 12:04:40 -08:00
Stefan Niebler	0cfd08745c	[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2026-01-16 10:52:41 -08:00
xxi	ce561b6a8e	[TRTLLM-9111][feat] MoE test refactor: Extend MoE quantization test utilities with comprehensive quant algorithm support (#10691 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-16 10:26:33 +08:00
heyuhhh	e3f27e06c7	[None][chore] Waive star attention unittests (#10439 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2026-01-16 10:12:32 +08:00
Perkz Zheng	71ccc07d2b	[None][feat] update trtllm-gen to support groupsTokensHeadsQ (#10261 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-15 02:24:25 -05:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Lucas Liebenwein	15b43e8a14	[https://nvbugs/5777041 ][fix] fix AutoDeploy ep sharding test (#10460 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-14 21:53:56 -05:00
Tzu-Ling Kan	c99faaed06	[#9760 ][fix] Use RequestError for validation errors to prevent engine shutdown (#9761 ) Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>	2026-01-14 10:22:36 -05:00
HuiGao-NV	b10704428d	[https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data (#10569 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-14 07:53:01 -05:00
shuyixiong	babd5ecacc	[https://nvbugs/5760740 ][fix] Enable ray tests (#10272 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2026-01-14 19:25:46 +08:00
mpikulski	052c36ddd2	[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-14 10:31:03 +01:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
Leslie Fang	bc119f5644	[None][chore] Add test configurable moe module (#10575 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 07:25:57 +08:00
Frida Hou	bf16fbd86c	[#9283 ][feat] AutoDeploy: separate rms pattern detection from fusion (#9969 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-13 14:57:27 -05:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Tailing Yuan	38296a472b	[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-13 19:17:03 +08:00
JunyiXu-nv	e291a834db	[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} (#9937 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2026-01-13 03:57:14 -05:00

1 2 3 4 5 ...

1157 Commits