TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-12 05:53:33 +08:00

Author	SHA1	Message	Date
Izzy Putterman	864b61cadd	[None][feat] Speculative One Model: FlashInfer sampling (#10284 ) Signed-off-by: Izzy Putterman <iputterman@nvidia.com>	2026-01-20 12:56:43 -05:00
jthomson04	2db3d7eeba	[None][chore] Async Transfer Manager (#9891 ) Signed-off-by: jthomson04 <jwillthomson19@gmail.com>	2026-01-20 12:12:47 -05:00
Yanchao Lu	ae8f74b620	[None][chore] Reduce tedious logs (#10847 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-20 22:56:24 +08:00
Grzegorz Kwasniewski	eb326073d8	[TRTLLM-10785][feat] Fix sharding dashboard errors (#10786 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-20 09:25:36 +01:00
Yi Zhang	58311b2345	[None][fix] Remove unused params in attn (#10652 ) Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>	2026-01-20 03:08:59 -05:00
benzh-2025	4c8468c5d3	[None][fix] default disable gemm+allreduce fusion (#10656 )	2026-01-20 12:31:17 +08:00
Bo Li	f3a985ce27	[TRTLLM-10296][fix] Fix the potential misaligned access due to vectorized ld/st instructions in NVLinkOneSided A2A. (#10539 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-20 11:08:04 +08:00
Liao Lanyu	dbb858ae0c	[TRTLLM-10029][scheduler] Re-implement MicroBatchScheduler and CapacityScheduler in Python (#10273 ) Signed-off-by: junq <22017000+QiJune@users.noreply.github.com> Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com> Co-authored-by: junq <22017000+QiJune@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2026-01-20 10:31:13 +08:00
SamareshSingh	64ff5cac52	[None][chore] docs: clarify LoRA is not supported with --use_fp8_rowwise in Fp8RowwiseAttention (see #2603 ) (#10320 ) Signed-off-by: Samaresh Kumar Singh <ssam3003@gmail.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-19 04:38:00 -05:00
Lucas Liebenwein	9879400479	[#10642 ][feat] AutoDeploy: optimized canonicalize_graph utilities [1/2] (#10675 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-18 13:42:30 -05:00
Eran Geva	4d2916d683	[#10688 ][fix] AutoDeploy Fix CUDA graph batch sizes exceeding max_batch_size (#10687 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 13:31:01 -05:00
Eran Geva	a11f0dbd61	[#10696 ][fix] AutoDeploy prevent torch.export from specializing batch dimension when max_batch_size=1 (#10697 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-18 10:42:49 +02:00
Grzegorz Kwasniewski	7bf4dd9f63	[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>	2026-01-17 04:02:06 -05:00
Yuxian Qiu	cef67b4f8d	[None][fix] convert to CUDA tensor before calling _resmooth_kernel. (#10770 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-17 16:18:34 +08:00
Chenghao Zhang	0b748d5bba	[None][chore] update flashinfer to 0.6.0 (#10522 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 16:22:06 -05:00
Chenghao Zhang	b6acd96616	[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 12:04:40 -08:00
Stefan Niebler	0cfd08745c	[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2026-01-16 10:52:41 -08:00
Wanli Jiang	722978b837	[TRTLLM-10305][feat] Support customized seq len larger than model config (#10600 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-16 16:07:36 +08:00
dongfengy	6dfb8d7084	[None][fix] Fix Piecewise Cuda Graph for GPTOSS (#10631 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-16 15:47:34 +08:00
Necofish	03cdf5804f	[None][fix] impl fused triton kernel for e8m0 resmooth to reduce memory footprint (#10327 ) Signed-off-by: Nekofish-L <liuxiangyang@mail.ustc.edu.cn> Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>	2026-01-15 22:13:18 -08:00
Yukun He	f001c4946d	[https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP (#10633 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-16 13:03:36 +08:00
Enwei Zhu	7b8b9ccbaf	[https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits (#10698 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-16 11:04:26 +08:00
Lucas Liebenwein	49c6f73554	[None][bug] AutoDeploy: fix regression in kv cache resize memory estimation (#10726 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-16 09:52:03 +08:00
heyuhhh	dfac07c045	[None][feat] Support to export data in trtllm-eval (#10075 ) Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>	2026-01-15 23:27:08 +08:00
Lizhi Zhou	93db0d5e18	[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 19:18:21 +08:00
Lizhi Zhou	ff277b591e	[https://nvbugs/5791830 ][fix] fix pp loop hang caused by i-sending new requests (#10665 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 16:33:55 +08:00
Yiqing Yan	f4ace99218	[None][chore] Bump version to 1.3.0rc0 (#10681 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-15 13:55:44 +08:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Void	f7de285a82	[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-14 22:15:29 -05:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
Tzu-Ling Kan	c99faaed06	[#9760 ][fix] Use RequestError for validation errors to prevent engine shutdown (#9761 ) Signed-off-by: tzulingk@nvidia.com <tzulingk@nvidia.com>	2026-01-14 10:22:36 -05:00
Emma Qiao	01083b56bf	[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: xxi <xxi@nvidia.com> Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: xxi <xxi@nvidia.com> Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-14 21:54:04 +08:00
HuiGao-NV	b10704428d	[https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data (#10569 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-14 07:53:01 -05:00
Kyungmin Lee	25148d3fee	[None][feat] Support new Transformers RoPE configuration format (#10636 ) Signed-off-by: lkm2835 <lkm2835@gmail.com>	2026-01-14 19:41:27 +09:00
xxi	e9817461ba	[None][chore] improve the readability of log for cutlass can only sup… (#10630 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:33:45 -05:00
xxi	d8862505b9	[None][chore] enable EPLB for DEEPGEMM (#10617 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:28:08 -05:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
mpikulski	052c36ddd2	[TRTLLM-9522][feat] support image_embeds in OpenAI API (#9715 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-14 10:31:03 +01:00
Zhenhuan Chen	287f6c2e0f	[None][test] add log_samples and output_path for trtllm_eval (#10629 ) Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>	2026-01-14 16:01:38 +08:00
Yukun He	15281de799	[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. (#10503 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-14 14:26:18 +08:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
Leslie Fang	795e690bca	[https://nvbugs/5753788 ][chore] Padding empty chunk for configurable moe (#10451 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 10:42:17 +08:00
Yuxian Qiu	d3f4fbb742	[None][fix] Avoid write-write race for async pp send. (#10488 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 09:39:36 +08:00
Yuxian Qiu	2acd03030a	[https://nvbugs/5781589 ][fix] Implement pp skip forward for all spec workers. (#10578 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 09:36:35 +08:00
Balaram Buddharaju	ccdfa43a6e	[https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP (#10533 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-13 15:48:42 -05:00
Frida Hou	bf16fbd86c	[#9283 ][feat] AutoDeploy: separate rms pattern detection from fusion (#9969 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-13 14:57:27 -05:00
Neta Zmora	7b7f1e2ba1	[None][feat] AutoDeploy: refactor memory usage logging (#8505 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-01-13 21:03:09 +02:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Tailing Yuan	38296a472b	[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-13 19:17:03 +08:00
Void	7d16f3a28b	[https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-13 17:16:22 +08:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
JunyiXu-nv	e291a834db	[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} (#9937 ) Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>	2026-01-13 03:57:14 -05:00
Yuxian Qiu	04b112651b	[None][feat] Hang detection for executor loop and worker. (#10480 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-13 02:34:32 -05:00
xxi	ba1037ca4a	[https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" (#10527 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-12 20:21:01 -05:00
Iman Tabrizian	48b09e5a25	[https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg (#10111 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-12 18:23:26 -05:00
Gal Hubara-Agam	18a33764b5	[None][chore] Print correct backend name in benchmark report (#10597 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-12 14:46:00 -05:00
Xianjie Qiao	3a9a00b544	[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2026-01-12 14:10:31 +08:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Faraz	fdbdbba540	[https://nvbugs/5752687 ][fix] Choose register model config over root config for VLM (#10553 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>	2026-01-09 12:10:52 -05:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Kaiyu Xie	1c69aad850	[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-09 09:50:57 -05:00
Yuxian Qiu	80f261ea36	[https://nvbugs/5622938 ][feat] Run sample_async on extra stream. (#10215 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-09 18:15:18 +08:00
Chang Liu	78bb245554	[https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 (#10552 )	2026-01-09 00:49:39 -08:00
JadoTu	4c498bfe58	[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873 ) Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>	2026-01-09 14:50:16 +08:00
Yuxian Qiu	afa55c12b6	[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . (#10547 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-08 21:50:04 -05:00
Mike Iovine	4092a87b6f	[https://nvbugs/5740075 ][fix] Fix sm120 speculation (#10049 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-08 19:55:43 -05:00
Eran Geva	489dd60312	[#10513 ][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor (#10512 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-08 14:49:40 -05:00
mpikulski	e0331297a6	[TRTLLM-9522][fix] broken cast (#9975 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>	2026-01-08 06:47:39 -05:00
William Zhang	c0ae6bbdbe	[None][feat] EPD for Qwen3 VL (#10470 ) * Why? We would like to support EPD disaggregated serving for Qwen3 VL. * What? This commit adds such support, and extends existing unit tests for correctness checks. Some minor (protected) interface changes had to be made to the weight mapper as a side-effect. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-08 06:45:54 -05:00
Eran Geva	6511dbaea0	[#10417 ][fix] AutoDepoloy - Reverted to direct computation of minusA (#10509 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-08 13:43:41 +02:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00
Yiqing Yan	dc6b743fb6	[None][chore] Bump version to 1.2.0rc8 (#10542 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-08 04:51:44 -05:00
Yukun He	09d9878385	[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. (#10339 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-08 10:21:02 +08:00
Ziyi Xiong	7187afe7b9	[https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank (#10445 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-01-07 13:55:45 -05:00
tcherckez-nvidia	7e88212d24	[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct (#10455 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-07 10:30:24 +02:00
Kanghwan	dc32bac9fc	[#4745 ][fix] Pass lora_params through Qwen2/3 model forward (#10174 ) Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>	2026-01-07 15:30:17 +08:00
Fanrong Li	a34aa63685	[https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 (#10449 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-07 12:29:51 +08:00
Zongfei Jing	bb2f883296	[None] [feat] Add test script and raster M for gather fc1 kernel (#10429 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-07 09:31:49 +08:00
Lucas Liebenwein	bb6a3973aa	[https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes (#10466 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-06 19:55:49 -05:00
Lizhi Zhou	6a4bebcd01	[None][chore] remove redundant retries while binding to arbitrary port (#10452 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-06 10:39:15 -05:00
Kaiyu Xie	2eaabd7461	[None] [fix] Fix undefined tokens_per_block (#10438 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-06 02:42:37 -05:00
Karthik	617f728903	[#8460 ][feat] Revive and simplify Model Explorer visualization integration (#10150 ) Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>	2026-01-05 22:15:25 -05:00
Xiao Xuan	46f035befe	[#2511 ][fix] eagle: qwen2 capture hidden states (#10091 ) Signed-off-by: SpicyNoodle <522169030@qq.com>	2026-01-05 21:46:41 -05:00
alel	6b8ae6fa81	[None][feat] CuteDSL MOE FC1 Enhancement (#10088 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2026-01-06 09:30:43 +08:00
JadoTu	82aaf98070	[None][feat] add the eos tokens in generation config to stop words in the sampler (#10389 ) Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>	2026-01-06 09:24:03 +08:00
Karthik	4e50cb5708	[#10170 ][fix] Add export patch for GraniteMoe MoE models to enable torch.export compatibility (#10169 ) Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>	2026-01-05 16:13:45 -05:00
Grzegorz Kwasniewski	ea380ff45c	[TRTLLM-9767][feat] Fixed recursive node traversals (#10379 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-05 18:42:06 +02:00
Mike Iovine	db2614ef10	[https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case (#10385 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 17:20:14 +01:00
Mike Iovine	bedfff4f00	[https://nvbugs/5772521 ][fix] Fix draft token tree chain crash (#10386 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2026-01-05 17:18:44 +01:00
Anthony Chang	225d3a9001	[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998 ) Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>	2026-01-05 17:16:12 +01:00
Balaram Buddharaju	a792c23dcf	[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-05 20:08:03 +08:00
Eran Geva	3749a2ce1c	[#10374 ][fix] fixed race condition in AutoDeploy's mp tests port acquisition (#10366 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-05 13:33:01 +02:00
Fanrong Li	4931c5eb3a	[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 16:43:42 +08:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
HuiGao-NV	2f768b76f8	[https://nvbugs/5715568 ][fix] Force release torch memory when LLM is destroyed (#10314 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-05 15:30:18 +08:00
Pengyun Lin	c04cf4334e	[TRTLLM-8242][feat] Add stability tags for serve subcommand (#10012 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-05 14:16:15 +08:00
Yukun He	0937df2c68	[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 13:44:09 +08:00
Tailing Yuan	a7fe043b13	[None][feat] Layer-wise benchmarks: support TEP balance, polish slurm scripts (#10237 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-05 11:23:04 +08:00
Cheng Hang	656c705ff1	[None][feat] sm100 weight-only kernel (#10190 ) Signed-off-by: Cheng Hang <chang@nvidia.com>	2026-01-05 09:44:36 +08:00
Fanrong Li	b5a1e10bc0	[https://nvbugs/5779534 ][fix] fix buffer reuse for CUDA graph attention metadata (#10393 ) Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>	2026-01-05 09:43:44 +08:00

1 2 3 4 5 ...

2083 Commits