Commit Graph

1454 Commits

Author SHA1 Message Date
Grzegorz Kwasniewski
7bf4dd9f63
[TRTLLM-10318][feat] Fixing Nemotron sharding: support for sharding buffers (#10319)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Lucas <11156568+lucaslie@users.noreply.github.com>
Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
Co-authored-by: Lucas <11156568+lucaslie@users.noreply.github.com>
2026-01-17 04:02:06 -05:00
Chenghao Zhang
0b748d5bba
[None][chore] update flashinfer to 0.6.0 (#10522)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 16:22:06 -05:00
Chenghao Zhang
b6acd96616
[None][fix] AutoDeploy: Fix the nvfp4 fused_moe (#10727)
Signed-off-by: nvchenghaoz <211069071+nvchenghaoz@users.noreply.github.com>
2026-01-16 12:04:40 -08:00
Stefan Niebler
0cfd08745c
[TRTLLM-9735][feat] Add processed logprobs functionality to TorchSampler (#9675)
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Signed-off-by: Erin Ho <14718778+hchings@users.noreply.github.com>
Co-authored-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
Co-authored-by: Erin Ho <14718778+hchings@users.noreply.github.com>
2026-01-16 10:52:41 -08:00
Wanli Jiang
722978b837
[TRTLLM-10305][feat] Support customized seq len larger than model config (#10600)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-16 16:07:36 +08:00
dongfengy
6dfb8d7084
[None][fix] Fix Piecewise Cuda Graph for GPTOSS (#10631)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-16 15:47:34 +08:00
Yukun He
f001c4946d
[https://nvbugs/5782112][fix] Fix hanging issue for MNNVL Allreduce under PP (#10633)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-16 13:03:36 +08:00
Enwei Zhu
7b8b9ccbaf
[https://nvbugs/5669671][fix] Support GuidedDecoder with sharded logits (#10698)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-16 11:04:26 +08:00
Lucas Liebenwein
49c6f73554
[None][bug] AutoDeploy: fix regression in kv cache resize memory estimation (#10726)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-16 09:52:03 +08:00
Lizhi Zhou
93db0d5e18
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg (#10406)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 19:18:21 +08:00
Lizhi Zhou
ff277b591e
[https://nvbugs/5791830][fix] fix pp loop hang caused by i-sending new requests (#10665)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 16:33:55 +08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias (#10099)
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Void
f7de285a82
[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api (#10072)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-14 22:15:29 -05:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel (#9905)
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 (#9818)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
HuiGao-NV
b10704428d
[https://nvbugs/5787566][fix] Only keep a limited number of performance statistic data (#10569)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-14 07:53:01 -05:00
Kyungmin Lee
25148d3fee
[None][feat] Support new Transformers RoPE configuration format (#10636)
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-14 19:41:27 +09:00
xxi
e9817461ba
[None][chore] improve the readability of log for cutlass can only sup… (#10630)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:33:45 -05:00
xxi
d8862505b9
[None][chore] enable EPLB for DEEPGEMM (#10617)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:28:08 -05:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support (#10532)
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
Yukun He
15281de799
[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. (#10503)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-14 14:26:18 +08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. (#10380)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
Leslie Fang
795e690bca
[https://nvbugs/5753788][chore] Padding empty chunk for configurable moe (#10451)
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-14 10:42:17 +08:00
Yuxian Qiu
d3f4fbb742
[None][fix] Avoid write-write race for async pp send. (#10488)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:39:36 +08:00
Yuxian Qiu
2acd03030a
[https://nvbugs/5781589][fix] Implement pp skip forward for all spec workers. (#10578)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:36:35 +08:00
Balaram Buddharaju
ccdfa43a6e
[https://nvbugs/5791900][fix] Fix HelixCpMnnvlMemory init with PP (#10533)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-13 15:48:42 -05:00
Frida Hou
bf16fbd86c
[#9283][feat] AutoDeploy: separate rms pattern detection from fusion (#9969)
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-13 14:57:27 -05:00
Neta Zmora
7b7f1e2ba1
[None][feat] AutoDeploy: refactor memory usage logging (#8505)
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-01-13 21:03:09 +02:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. (#10480)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
xxi
ba1037ca4a
[https://nvbugs/5762336][fix] support to parse the keyword modules_to_not_convert of the HF model config" (#10527)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[https://nvbugs/5689235][fix] Fix cancellation+chunked prefill+disagg (#10111)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support (#10355)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Yuxian Qiu
80f261ea36
[https://nvbugs/5622938][feat] Run sample_async on extra stream. (#10215)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-09 18:15:18 +08:00
Chang Liu
78bb245554
[https://nvbugs/5787453][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 (#10552) 2026-01-09 00:49:39 -08:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873)
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Yuxian Qiu
afa55c12b6
[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445. (#10547)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 21:50:04 -05:00
Mike Iovine
4092a87b6f
[https://nvbugs/5740075][fix] Fix sm120 speculation (#10049)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
Eran Geva
489dd60312
[#10513][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor (#10512)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 14:49:40 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL (#10470)
* Why?

We would like to support EPD disaggregated serving for Qwen3 VL.

* What?

This commit adds such support, and extends existing unit tests for
correctness checks.

Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
Eran Geva
6511dbaea0
[#10417][fix] AutoDepoloy - Reverted to direct computation of minusA (#10509)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 13:43:41 +02:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yukun He
09d9878385
[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. (#10339)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-08 10:21:02 +08:00
Ziyi Xiong
7187afe7b9
[https://nvbugs/5781589][fix] Skip spec dec for non-last rank (#10445)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-01-07 13:55:45 -05:00
tcherckez-nvidia
7e88212d24
[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct (#10455)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-07 10:30:24 +02:00