TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

Author	SHA1	Message	Date
JunyiXu-nv	6d2daec5d0	[TRTLLM-8274][feat] Check if executor is shutdown in /health entrypoint (#9057 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-04 13:49:40 +08:00
Tailing Yuan	4eed648e22	[None][feat] Add weights initialization and context phase parser to layer-wise benchmarks (#9667 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-12-04 13:41:15 +08:00
Jin Li	87e0c8a749	[TRTLLM-7073][feat] Support torch compile for PP for Llama and DeepSeekV3 (#7838 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-04 13:32:11 +08:00
Necofish	323a82f4d5	[None][fix] fix error when processing batches containing both text and mm data (#8381 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn>	2025-12-04 14:28:24 +09:00
mpikulski	744f0eff1b	[TRTLLM-9522][fix] restore `trtllm-serve mm_embedding_serve` (#9669 )	2025-12-03 19:27:11 -08:00
Wanli Jiang	4485e516a2	[None][feat] Update Qwen3CodeToolParser to align tool-calling parameters (#9540 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-04 06:47:32 +08:00
gramnarayan	098b9ff226	[#9147 ][feat] AutoDeploy: Draft Target Speculative Decoding (#9275 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2025-12-04 05:13:49 +08:00
Lucas Liebenwein	a1964bcbbc	[#9643 ][fix] AutoDeploy: fix nano sharding config (#9668 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-04 03:10:25 +08:00
Wei-Ming Chen	d9fba85396	[OMNIML-2932] [feat] nvfp4 awq support (#8698 ) Signed-off-by: weimingc <17592131+meenchen@users.noreply.github.com>	2025-12-03 19:47:13 +02:00
Gal Hubara-Agam	d7bd62b1a0	[https://nvbugs/5693853 ][fix] Fix error handling when querying machin… (#9483 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2025-12-03 19:44:51 +02:00
Guoming Zhang	b5e2b9b51f	[https://nvbugs/5702795 ][fix] Remove the warning message for aten.log. (#9665 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-04 00:02:15 +08:00
Iman Tabrizian	09beaa5933	[None][fix] Fix wide ep MoE error (#9642 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-03 23:11:06 +08:00
Michal Guzek	4e5b10da48	[https://nvbugs/5552132 ][fix] Enable LoRa for GPT OSS Torch (#8253 ) Signed-off-by: Michal Guzek <mguzek@nvidia.com>	2025-12-03 15:42:15 +01:00
Perkz Zheng	992781dc7b	[None][feat] update trtllm-gen nvfp4 kernels with better performance (#9510 ) Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-12-03 21:35:49 +08:00
JunyiXu-nv	743486b2ea	[TRTLLM-6842][feat] Support Response API for general purpose (#9392 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-12-03 16:49:26 +08:00
Pengyun Lin	1d4fb89235	[TRTLLM-8241][feat] Aliasing to comply to LlmArgs (#9586 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-12-03 15:28:45 +08:00
Bo Li	8b5ededc83	[TRTLLM-9391][chore] Automatically estimate required workspace. (#9535 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-12-03 12:49:38 +08:00
Suyog Gupta	93871d52b2	[None][chore] AutoDeploy update cuda stream manager for multi-device (#9575 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-12-02 20:43:14 -08:00
heyuhhh	a08eb81cce	[None][feat] Add RocketKV usage doc and e2e accuracy test on LongBenchV2 (#9572 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-03 11:33:46 +08:00
Anurag Mukkara	642dfae73a	[https://nvbugs/5698434 ][fix] Use separate weight mapper for draft (#9607 ) Signed-off-by: Anurag Mukkara <134339030+amukkara@users.noreply.github.com>	2025-12-02 16:00:22 -08:00
Enwei Zhu	a3455f55c7	[None][chore] Fix trtllm-eval and move GroupedGemmInputsHelper (#9612 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-03 07:55:03 +08:00
Chang Liu	3916d032ec	[None][chore] Remove traceback dump for multimodal input processor (#9634 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-12-03 07:41:03 +08:00
Grzegorz Kwasniewski	0a7a88e74e	[TRTLLM-8946][feat] Improved heuristics to detect shardable regions (#9200 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com> Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-02 22:08:19 +01:00
Neta Zmora	a560ba5546	[#9550 ][feat] AutoDeploy: Add NVFP4 Cutlass MoE kernels (#9551 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-03 01:39:38 +08:00
Shi Xiaowei	227d42e492	[https://nvbugs/5651854 ][fix] Fix dist-serving perf by clearing CPU affinity (#9549 ) Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>	2025-12-03 01:17:03 +08:00
Lucas Liebenwein	e72ce98c0f	[#9150 ][feat] AutoDeploy: reviewer comments for #9150 (#9527 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-12-02 12:09:10 -05:00
William Zhang	2dd3ebf037	[#9150 ][feat] Add code for nano v3 to custom implementation in AD (#9465 ) * Why? We would like to show an alternative to monkey-patching in AutoDeploy. * What? This commit builds on the existing custom model implementation for NemotronH and adds the bits relevant for MoE layers. Part of #9150. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-12-02 08:56:44 -08:00
Thor Johnsen	95049eea86	[https://nvbugs/5627710 ][fix] Fix synchronization bugs in KvCacheTransferManager that can cause corrupted blocks (#9056 ) Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com> Signed-off-by: Thor Johnsen <41591019+thorjohnsen@users.noreply.github.com> Co-authored-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com> Co-authored-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-12-02 09:10:21 -06:00
Yan Chunwei	b86256eb54	[TRTLLM-9144][fix] enhance RPC robustness (#8711 ) Signed-off-by: Superjomn <328693+Superjomn@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2025-12-02 21:37:59 +08:00
Jin Li	21e3dc11d8	[https://nvbugs/5667774 ][fix] Refine Piecewise Cuda Graph Condition for DP (#9393 ) Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>	2025-12-02 21:09:15 +08:00
brb-nv	be48cdf1d1	[TRTLLM-9466][test] Evaluate helix parallelism with DSV3 Lite (#9597 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-02 20:10:07 +08:00
mpikulski	84a1531594	[TRTLLM-9488][feat] use FlashInfer.sampling by default (#9545 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-12-02 16:29:55 +08:00
shuyixiong	1a2118b8fe	[https://nvbugs/5702793 ][fix] Fix uncontiguous tensor view (#9576 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-12-02 15:41:32 +08:00
Wanli Jiang	5657a00ec0	[FMDL-1328][feat] Add support for nano-v3 and super-v3 with pytorch backend (#9261 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2025-12-02 13:40:20 +08:00
Guoming Zhang	6fbe87c8b5	[None][chroe] Polish qwen3-next modeling code. (#8902 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2025-12-02 11:28:35 +08:00
Iman Tabrizian	356a52edf5	[None][feat] Add support for KVCache reuse for DSv32 (#9383 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2025-12-02 11:14:30 +08:00
Shijie	dcf5c86720	[None][feat] Unify nvfp4 gemm backend (#8963 ) Signed-off-by: Shijie Wang <jaywan@nvidia.com> Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Shijie <jaywan@nvidia.com> Co-authored-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-02 11:03:51 +08:00
Yuening Li	09c840184c	[None][fix] Prevent YAML partial kv_cache_config from incorrectly overriding the complete kv_cache_config (#9262 ) Signed-off-by: Yuening Li <62227368+Yuening-wa@users.noreply.github.com>	2025-12-02 10:10:08 +08:00
Eran Geva	c9771ebb99	[#9198 ][feat] Refactor dist ops in AutoDeploy (#9301 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-12-02 02:36:32 +08:00
Chenghao Zhang	0a2104dce9	[None][feat] AutoDeploy: Use the router gemm op for nemotron MOE (#9500 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-12-01 10:24:31 -08:00
Venky	639c939a4f	[TRTC-1943][feat] Env vars override support in LLM API (#9104 ) Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>	2025-12-01 10:04:49 -08:00
brb-nv	f61067cbb5	[None][chore] Defer exposing context parallel configs (#9552 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-12-01 09:50:02 -08:00
Stefan Niebler	f155812eb0	[TRTLLM-6756][feat] Add Beam Search to TorchSampler (#8509 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2025-12-01 18:48:04 +01:00
Enwei Zhu	90345ad3f3	[None][fix] Skip Allreduce init for Attention DP (#9542 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 21:24:40 +08:00
alel	4107254c82	[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (#9428 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2025-12-01 18:10:45 +08:00
Yukun He	730eb3d859	[None][fix] Replace hash method with unique_id for cutedsl MoE runners. (#9569 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-12-01 17:02:33 +08:00
Neta Zmora	bc25fff039	[#9496 ][fix] AutoDeploy: remove auto-tuner from nvfp4_gemm forward (#9497 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>	2025-12-01 10:04:39 +02:00
Fanrong Li	d69bf9f92a	[None][feat] add chat template kwargs support to longbench-v2 (#9544 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-12-01 15:59:13 +08:00
Gaoji Liu	9d2df04a72	[None][doc] fix mtp.py typo (#9307 ) Signed-off-by: liugaoji <757394026@qq.com>	2025-11-30 21:55:13 -08:00
Li Min	1797e91dfd	[TRTLLM-6222][feat] Extend cute_dsl_nvfp4_gemm to sm103. (#9543 ) Signed-off-by: Mindy Li <11663212+limin2021@users.noreply.github.com>	2025-12-01 10:19:36 +08:00
Enwei Zhu	34e2fa5c96	[https://nvbugs/5690172 ][fix] Fix Qwen3-235B ATP accuracy issue with PDL (#9530 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-01 09:10:21 +08:00
heyuhhh	6e470aab72	[None] [feat] Optimize the algorithm part of RocketKV (#9333 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2025-12-01 09:04:09 +08:00
xxi	c12e67bb66	[TRTLLM-8958][feat] and [TRTLLM-8960]: create ConfigurableMoE and support TRTLLMGenFusedMoE as backend (#9486 )	2025-12-01 08:37:07 +08:00
brb-nv	b77f4ffe54	[TRTLLM-5971][feat] Integrate helix parallelism (#9342 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-29 15:17:30 -08:00
dominicshanshan	6345074686	[None][chore] Weekly mass integration of release/1.1 -- rebase (#9522 ) Signed-off-by: yunruis <205571022+yunruis@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Mike Iovine <miovine@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com> Signed-off-by: qgai <qgai@nvidia.com> Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com> Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com> Signed-off-by: Simeng Liu <simengl@nvidia.com> Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com> Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Vincent Zhang <vinczhang@nvidia.com> Signed-off-by: peaceh <103117813+peaceh-nv@users.noreply.github.com> Signed-off-by: Michal Guzek <mguzek@nvidia.com> Signed-off-by: Michal Guzek <moraxu@users.noreply.github.com> Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com> Signed-off-by: leslie-fang25 <leslief@nvidia.com> Signed-off-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: yunruis <205571022+yunruis@users.noreply.github.com> Co-authored-by: sunnyqgg <159101675+sunnyqgg@users.noreply.github.com> Co-authored-by: brb-nv <169953907+brb-nv@users.noreply.github.com> Co-authored-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com> Co-authored-by: JunyiXu-nv <219237550+JunyiXu-nv@users.noreply.github.com> Co-authored-by: Simeng Liu <109828133+SimengLiu-nv@users.noreply.github.com> Co-authored-by: Guoming Zhang <137257613+nv-guomingz@users.noreply.github.com> Co-authored-by: Jin Li <59594262+liji-nv@users.noreply.github.com> Co-authored-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Co-authored-by: Vincent Zhang <vcheungyi@163.com> Co-authored-by: peaceh-nv <103117813+peaceh-nv@users.noreply.github.com> Co-authored-by: Michal Guzek <moraxu@users.noreply.github.com> Co-authored-by: Chang Liu <9713593+chang-l@users.noreply.github.com> Co-authored-by: Leslie Fang <leslief@nvidia.com> Co-authored-by: Shunkangz <182541032+Shunkangz@users.noreply.github.com> Co-authored-by: Shunkang <182541032+Shunkangz@users.noreply.github.co> Co-authored-by: QI JUN <22017000+QiJune@users.noreply.github.com>	2025-11-29 21:48:48 +08:00
Grzegorz Kwasniewski	cff54fcae3	[#8948 ][feat] Support custom sharding config (#9143 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2025-11-29 05:28:05 +08:00
binghanc	db5b876124	[None][feat] support for more accurate AR calculation (#9323 ) Signed-off-by: binghanc <176802681+binghanc@users.noreply.github.com>	2025-11-29 00:34:21 +08:00
Matthias Jouanneaux	f8dd494536	[None][perf] Helix: improve all-to-all perf for large CP size (#9494 ) Signed-off-by: Matthias Jouanneaux <mjoux@nvidia.com> Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com> Co-authored-by: Zheyu Fu <zheyuf@nvidia.com>	2025-11-28 07:24:55 -08:00
mpikulski	e5f39ec7cf	[TRTLLM-9488][feat] add 'disable_flashinfer_sampling' config option (#9454 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-28 13:00:39 +01:00
Robin Kobus	5eae3650c3	[None][fix] Pass checkpoint_format to create_input_processor (#9521 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-28 10:32:29 +01:00
Zhenhuan Chen	7c3bb8534d	[None][chore] Revert "[None][fix] change allreduce workspace dtype to torch.int64 t… (#9538 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-28 16:45:23 +08:00
Yukun He	60c43a200a	[None][fix] Fix on-disk cache and revise logger/statistics for AutoTuner. (#9211 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-28 13:32:21 +08:00
Lucas Liebenwein	2f8bd6fb36	[#9150 ][feat] AutoDeploy Nemotron-Flash support (#9504 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2025-11-27 18:03:57 +01:00
Enwei Zhu	c2562fc800	[https://nvbugs/5687820 ][fix] Remove self.abort() in DetokenizedGenerationResult (#9449 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-27 22:54:40 +08:00
Bo Li	62b771877c	[TRTLLM-9389][chore] Refactor AlltoallMethodType. (#9388 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2025-11-27 21:09:29 +08:00
Fanrong Li	2d5eadf65f	[None][fix] fix TP support for DeepSeek-V3.2 on hopper (#9484 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-27 21:02:25 +08:00
Zhenhuan Chen	e47927e847	[None][fix] change allreduce workspace dtype to torch.int64 to avoid overflow (#9479 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2025-11-27 17:08:41 +08:00
xxi	f1ed057b4c	[cherry-pick][https://nvbugs/5670793 ][fix] Solve trtllm-serve launch_disaggregated issue (#9346 ) Signed-off-by: xxi <xxi@nvidia.com>	2025-11-27 16:13:58 +08:00
Ziyi Xiong	1dd55d8507	[https://nvbugs/5698581 ][fix] Init draft tokens for CUDA graph dummy request (#9505 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2025-11-27 13:05:37 +08:00
Jiagan Cheng	14762e0287	[None][fix] Replace PYTORCH_CUDA_ALLOC_CONF with PYTORCH_ALLOC_CONF to fix deprecation warning (#9294 ) Signed-off-by: Jiagan Cheng <jiaganc@nvidia.com>	2025-11-27 12:22:01 +08:00
QI JUN	a67d94963e	[None][chore] update comments in llm_args.py (#9472 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com>	2025-11-27 11:06:34 +08:00
Aurelien Chartier	f2f197360d	[#9463 ][feat] Add revision option to trtllm commands (#9498 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-11-27 09:30:01 +08:00
Chenghao Zhang	18fbda5cdb	[None][feat] AutoDeploy: Add A_log fusion for Mamba layers (#9422 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2025-11-26 14:39:20 -08:00
Chenghao Zhang	bc7b60e016	[None][feat] AutoDeploy: Remove redundant copies in mamba layers (#9461 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com> Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-26 14:38:33 -08:00
Aurelien Chartier	ef7ee6a940	[None][feat] Add environment variable to force spec-dec number of accepted tokens (#9371 ) Signed-off-by: Aurelien Chartier <2567591+achartier@users.noreply.github.com>	2025-11-26 07:22:16 -08:00
Chang Liu	b10137fdd5	[None][feat] Support MLA chunked prefill for DeepSeek V3.2 model (#9376 ) Signed-off-by: Chang Liu (Enterprise Products) <9713593+chang-l@users.noreply.github.com>	2025-11-26 16:38:25 +08:00
Enwei Zhu	1bf2d750a2	[None][chore] Upgrade CuteDSL to 4.3.0 (#9444 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-11-26 14:53:09 +08:00
JunyiXu-nv	b7308a4000	[https://nvbugs/5580099 ][fix] Cherry pick IMA issue fix from release/1.1 (#9032 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2025-11-26 13:09:06 +08:00
shuyixiong	d8acea1db3	[TRTLLM-9293][feat] Enable partial weight loading to support streaming update weights (#9224 ) Signed-off-by: shuyix <219646547+shuyixiong@users.noreply.github.com>	2025-11-26 10:59:06 +08:00
Yiqing Yan	1b9edf62c9	[None][chore] Bump version to 1.2.0rc5 (#9455 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2025-11-26 08:37:53 +08:00
Chuang Zhu	0e9c7f8c07	[https://nvbugs/5685143 ][fix] avoid cudaFree overlap with cuda graph (#9438 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2025-11-25 16:20:29 -08:00
Suyog Gupta	e484bec82f	[None][chore] AutoDeploy add multi stream moe pass to default.yaml (#9430 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-25 14:16:13 -08:00
Robin Kobus	32f53910ef	[TRTLLM-909][feat] Overlap context chunks in pipeline parallel mode (#9308 ) Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>	2025-11-25 22:11:51 +01:00
Eran Geva	afc52d7b93	[https://nvbugs/5647400 ] [fix] Enlarged the AllReduce workspace size to 64MB. Added AllReduce strategy to AD config. (#9145 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2025-11-25 10:56:07 -08:00
mpikulski	899fda9e47	[TRTLLM-9490][feat] use FlashInfer's top_k_sampling_from_probs (#9457 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-25 18:53:53 +01:00
mpikulski	c5f52ab304	[TRTLLM-8376][feat] top-p optimization (removes redundant softmax) (#9411 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2025-11-25 18:46:48 +01:00
YueWeng	cc336c4abd	[TRTLLM-8160][feat] Add draft token tree runtime on CDL (#8586 ) Signed-off-by: Yue Weng <25103990+yweng0828@users.noreply.github.com>	2025-11-25 09:40:55 -05:00
Pengyun Lin	fa61825c74	[None][feat] Support custom chat template for tool calling (#9297 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2025-11-25 22:07:04 +08:00
Tailing Yuan	51ef0379d2	[None][feat] Add a parser to layer-wise benchmarks (#9440 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2025-11-25 05:45:16 -08:00
Fanrong Li	c36f144591	[None][chore] Fix trtllm-eval for PyTorchLLM (#9427 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2025-11-25 04:49:03 -08:00
Yueh-Ting (eop) Chen	a38d91aae2	[https://nvbugs/5537996 ][fix] Let KV cache manager block initialization be aware whether it is doing a dry run or not (#9093 ) Before this commit, the kv cache manager does the same regardless, which causes a mis-calculation in free memory available to allocate for the KV cache manager, hence causing a crash. This commit fixes this by letting KV cache manager initialization be aware whether it is doing the dry run or not. If it is a dry run, use the max_tokens setting that is already pre-calculated and filled into kv_cache_config.max_tokens. Signed-off-by: eopXD <yuehtingc@nvidia.com>	2025-11-25 17:27:11 +08:00
Yukun He	e580da4155	[TRTLLM-7963][feat] Cold L2 cache when doing autotune benchmarking. (#8779 ) The performance results of some kernels could be easily affected by the warm/cold L2 cache status. To achieve more precise profiling results, the L2 cache is cleared for every execution by the circular buffer method for better benchmarking during autotuning. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-25 15:06:22 +08:00
William Zhang	a4049fc557	[#9413 ][fix] Minor fixes to nemotron H and custom models in AD (#9416 ) * Why? There were a couple of issues with the recently merged custom model injection for AutoDeploy + the reference implementation of nemotron H: - `d_mlp` was left in despite being mathematically always null (could lead to runtime issues during sharding). - the custom model mapping was inherited by children factories. * What? This commit fixes these issues, and refactors the key of the custom implementation to be based on the name of the configuration class as well. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2025-11-24 20:17:33 -08:00
Suyog Gupta	efd503751f	[#9271 ][perf] Enable multi-stream MOE optimization in AutoDeploy (#9322 ) Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>	2025-11-24 19:50:10 -08:00
Yuxian Qiu	8a0295015f	[None][chore] Reduce nested nvtx ranges. (#9347 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2025-11-25 09:58:41 +08:00
bhsueh_NV	1a93583438	[None][feat] Support Yarn on QwQ-32B model (#9059 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com> Signed-off-by: Jiang Shao <91270701+StudyingShao@users.noreply.github.com> Co-authored-by: NVJiangShao <91270701+StudyingShao@users.noreply.github.com>	2025-11-25 07:27:28 +08:00
Yibin Li	1ce483c999	[TRTLLM-7967][feat] Adding Starcoder2 PyTorch Backend Support (#8923 ) Signed-off-by: Yibin Li <109242046+yibinl-nvidia@users.noreply.github.com>	2025-11-24 11:23:22 -08:00
Yukun He	960851f419	[None][chore] Remove unnecessary log in the short tuning profile (#9387 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-24 12:31:26 +08:00
Yukun He	39076410a8	[https://nvbugs/5676748 ][fix] Fix mismatched nvfp4 gemm sf shape. (#9336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2025-11-24 12:16:32 +08:00
brb-nv	c045e359a7	[https://nvbugs/5637012 ][fix] Fix helix unit tests (#9369 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2025-11-23 19:34:22 -08:00

1 2 3 4 5 ...

1808 Commits