TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-04 10:11:47 +08:00

Author	SHA1	Message	Date
Grzegorz Kwasniewski	7bf4dd9f63	[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com> Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>	2026-01-17 04:02:06 -05:00
Chenghao Zhang	0b748d5bba	[None][chore] update flashinfer to 0.6.0 (#10522 ) Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 16:22:06 -05:00
Chenghao Zhang	b6acd96616	[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727 ) Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>	2026-01-16 12:04:40 -08:00
Stefan Niebler	0cfd08745c	[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com> Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com> Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com> Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>	2026-01-16 10:52:41 -08:00
Wanli Jiang	722978b837	[TRTLLM-10305][feat] Support customized seq len larger than model config (#10600 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-16 16:07:36 +08:00
dongfengy	6dfb8d7084	[None][fix] Fix Piecewise Cuda Graph for GPTOSS (#10631 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>	2026-01-16 15:47:34 +08:00
Yukun He	f001c4946d	[https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP (#10633 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-16 13:03:36 +08:00
Enwei Zhu	7b8b9ccbaf	[https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits (#10698 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-16 11:04:26 +08:00
Lucas Liebenwein	49c6f73554	[None][bug] AutoDeploy: fix regression in kv cache resize memory estimation (#10726 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-16 09:52:03 +08:00
Lizhi Zhou	93db0d5e18	[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 19:18:21 +08:00
Lizhi Zhou	ff277b591e	[https://nvbugs/5791830 ][fix] fix pp loop hang caused by i-sending new requests (#10665 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-15 16:33:55 +08:00
Anish Shanbhag	faa80e73fd	[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-14 21:06:07 -08:00
Void	f7de285a82	[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072 ) Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>	2026-01-14 22:15:29 -05:00
彭晋韬(jtao peng)	211c44b951	[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905 ) Signed-off-by: jintaop <jintaop@nvidia.com>	2026-01-15 07:29:15 +08:00
Emma Qiao	01083b56bf	[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Signed-off-by: xxi <xxi@nvidia.com> Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com> Co-authored-by: xxi <xxi@nvidia.com> Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-14 21:54:04 +08:00
HuiGao-NV	b10704428d	[https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data (#10569 ) Signed-off-by: Hui Gao <huig@nvidia.com>	2026-01-14 07:53:01 -05:00
Kyungmin Lee	25148d3fee	[None][feat] Support new Transformers RoPE configuration format (#10636 ) Signed-off-by: lkm2835 <lkm2835@gmail.com>	2026-01-14 19:41:27 +09:00
xxi	e9817461ba	[None][chore] improve the readability of log for cutlass can only sup… (#10630 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:33:45 -05:00
xxi	d8862505b9	[None][chore] enable EPLB for DEEPGEMM (#10617 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-14 05:28:08 -05:00
jmydurant	e7882d5c74	[None][feat] MiniMax M2 support (#10532 ) Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>	2026-01-14 17:38:58 +08:00
Yukun He	15281de799	[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. (#10503 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-14 14:26:18 +08:00
Yuxian Qiu	39cefd6125	[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 14:05:47 +08:00
Leslie Fang	795e690bca	[https://nvbugs/5753788 ][chore] Padding empty chunk for configurable moe (#10451 ) Signed-off-by: leslie-fang25 <leslief@nvidia.com>	2026-01-14 10:42:17 +08:00
Yuxian Qiu	d3f4fbb742	[None][fix] Avoid write-write race for async pp send. (#10488 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 09:39:36 +08:00
Yuxian Qiu	2acd03030a	[https://nvbugs/5781589 ][fix] Implement pp skip forward for all spec workers. (#10578 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-14 09:36:35 +08:00
Balaram Buddharaju	ccdfa43a6e	[https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP (#10533 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-13 15:48:42 -05:00
Frida Hou	bf16fbd86c	[#9283 ][feat] AutoDeploy: separate rms pattern detection from fusion (#9969 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-13 14:57:27 -05:00
Neta Zmora	7b7f1e2ba1	[None][feat] AutoDeploy: refactor memory usage logging (#8505 ) Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com> Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com> Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>	2026-01-13 21:03:09 +02:00
benzh-2025	6df2c8a074	[None][feat] add fp4 gemm + allreduce (#9729 ) Signed-off-by: benzh Signed-off-by: benzh-2025	2026-01-13 21:11:13 +08:00
Tailing Yuan	38296a472b	[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-13 19:17:03 +08:00
Guoming Zhang	bdaee87895	[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347 ) Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>	2026-01-13 17:13:55 +08:00
Yuxian Qiu	04b112651b	[None][feat] Hang detection for executor loop and worker. (#10480 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-13 02:34:32 -05:00
xxi	ba1037ca4a	[https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" (#10527 ) Signed-off-by: xxi <xxi@nvidia.com>	2026-01-12 20:21:01 -05:00
Iman Tabrizian	48b09e5a25	[https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg (#10111 ) Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>	2026-01-12 18:23:26 -05:00
Xianjie Qiao	3a9a00b544	[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401 ) Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>	2026-01-12 14:10:31 +08:00
Yechan Kim	8e0d20d901	[TRTLLM-10195][feat] K-EXAONE support (#10355 ) Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com> Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com> Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>	2026-01-12 00:29:51 +09:00
Yechan Kim	7295af68ba	[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435 ) Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>	2026-01-10 00:13:26 +09:00
Kaiyu Xie	1c69aad850	[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571 ) Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>	2026-01-09 09:50:57 -05:00
Yuxian Qiu	80f261ea36	[https://nvbugs/5622938 ][feat] Run sample_async on extra stream. (#10215 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-09 18:15:18 +08:00
Chang Liu	78bb245554	[https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 (#10552 )	2026-01-09 00:49:39 -08:00
JadoTu	4c498bfe58	[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873 ) Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>	2026-01-09 14:50:16 +08:00
Yuxian Qiu	afa55c12b6	[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . (#10547 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-08 21:50:04 -05:00
Mike Iovine	4092a87b6f	[https://nvbugs/5740075 ][fix] Fix sm120 speculation (#10049 ) Signed-off-by: Mike Iovine <miovine@nvidia.com>	2026-01-08 19:55:43 -05:00
Eran Geva	489dd60312	[#10513 ][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor (#10512 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-08 14:49:40 -05:00
William Zhang	c0ae6bbdbe	[None][feat] EPD for Qwen3 VL (#10470 ) * Why? We would like to support EPD disaggregated serving for Qwen3 VL. * What? This commit adds such support, and extends existing unit tests for correctness checks. Some minor (protected) interface changes had to be made to the weight mapper as a side-effect. Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>	2026-01-08 06:45:54 -05:00
Eran Geva	6511dbaea0	[#10417 ][fix] AutoDepoloy - Reverted to direct computation of minusA (#10509 ) Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>	2026-01-08 13:43:41 +02:00
bhsueh_NV	bea61bb17d	[None][fix] Mistral large 3 few code refine (#10405 ) Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>	2026-01-08 06:38:49 -05:00
Yukun He	09d9878385	[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. (#10339 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-08 10:21:02 +08:00
Ziyi Xiong	7187afe7b9	[https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank (#10445 ) Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>	2026-01-07 13:55:45 -05:00
tcherckez-nvidia	7e88212d24	[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct (#10455 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-07 10:30:24 +02:00

1 2 3 4 5 ...

1454 Commits