Commit Graph

1986 Commits

Author SHA1 Message Date
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce (#9729)
Signed-off-by: benzh 
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading (#10562)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Void
7d16f3a28b
[https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499)
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. (#10347)
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
JunyiXu-nv
e291a834db
[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} (#9937)
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-13 03:57:14 -05:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. (#10480)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
xxi
ba1037ca4a
[https://nvbugs/5762336][fix] support to parse the keyword modules_to_not_convert of the HF model config" (#10527)
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[https://nvbugs/5689235][fix] Fix cancellation+chunked prefill+disagg (#10111)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Gal Hubara-Agam
18a33764b5
[None][chore] Print correct backend name in benchmark report (#10597)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-12 14:46:00 -05:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe (#10401)
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support (#10355)
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
Faraz
fdbdbba540
[https://nvbugs/5752687][fix] Choose register model config over root config for VLM (#10553)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-09 12:10:52 -05:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test (#10435)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA (#10571)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Yuxian Qiu
80f261ea36
[https://nvbugs/5622938][feat] Run sample_async on extra stream. (#10215)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-09 18:15:18 +08:00
Chang Liu
78bb245554
[https://nvbugs/5787453][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 (#10552) 2026-01-09 00:49:39 -08:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case (#9873)
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Yuxian Qiu
afa55c12b6
[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445. (#10547)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 21:50:04 -05:00
Mike Iovine
4092a87b6f
[https://nvbugs/5740075][fix] Fix sm120 speculation (#10049)
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
Eran Geva
489dd60312
[#10513][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor (#10512)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 14:49:40 -05:00
mpikulski
e0331297a6
[TRTLLM-9522][fix] broken cast (#9975)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-08 06:47:39 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL (#10470)
* Why?

We would like to support EPD disaggregated serving for Qwen3 VL.

* What?

This commit adds such support, and extends existing unit tests for
correctness checks.

Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.

Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
Eran Geva
6511dbaea0
[#10417][fix] AutoDepoloy - Reverted to direct computation of minusA (#10509)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 13:43:41 +02:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine (#10405)
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yiqing Yan
dc6b743fb6
[None][chore] Bump version to 1.2.0rc8 (#10542)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-08 04:51:44 -05:00
Yukun He
09d9878385
[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. (#10339)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-08 10:21:02 +08:00
Ziyi Xiong
7187afe7b9
[https://nvbugs/5781589][fix] Skip spec dec for non-last rank (#10445)
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-01-07 13:55:45 -05:00
tcherckez-nvidia
7e88212d24
[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct (#10455)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-07 10:30:24 +02:00
Kanghwan
dc32bac9fc
[#4745][fix] Pass lora_params through Qwen2/3 model forward (#10174)
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2026-01-07 15:30:17 +08:00
Fanrong Li
a34aa63685
[https://nvbugs/5767223][feat] add pp support for DeepSeek-v3.2 (#10449)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel (#10429)
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[https://nvbugs/5732942][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes (#10466)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00
Lizhi Zhou
6a4bebcd01
[None][chore] remove redundant retries while binding to arbitrary port (#10452)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-06 10:39:15 -05:00
Kaiyu Xie
2eaabd7461
[None] [fix] Fix undefined tokens_per_block (#10438)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 02:42:37 -05:00
Karthik
617f728903
[#8460][feat] Revive and simplify Model Explorer visualization integration (#10150)
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 22:15:25 -05:00
Xiao Xuan
46f035befe
[#2511][fix] eagle: qwen2 capture hidden states (#10091)
Signed-off-by: SpicyNoodle <522169030@qq.com>
2026-01-05 21:46:41 -05:00
alel
6b8ae6fa81
[None][feat] CuteDSL MOE FC1 Enhancement (#10088)
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2026-01-06 09:30:43 +08:00
JadoTu
82aaf98070
[None][feat] add the eos tokens in generation config to stop words in the sampler (#10389)
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2026-01-06 09:24:03 +08:00
Karthik
4e50cb5708
[#10170][fix] Add export patch for GraniteMoe MoE models to enable torch.export compatibility (#10169)
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 16:13:45 -05:00
Grzegorz Kwasniewski
ea380ff45c
[TRTLLM-9767][feat] Fixed recursive node traversals (#10379)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-05 18:42:06 +02:00
Mike Iovine
db2614ef10
[https://nvbugs/5772414][fix] Fix draft token tree depth=1 corner case (#10385)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Mike Iovine
bedfff4f00
[https://nvbugs/5772521][fix] Fix draft token tree chain crash (#10386)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:18:44 +01:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 (#9998)
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order (#10350)
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00
Eran Geva
3749a2ce1c
[#10374][fix] fixed race condition in AutoDeploy's mp tests port acquisition (#10366)
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-05 13:33:01 +02:00
Fanrong Li
4931c5eb3a
[None][feat] update deepgemm to the DeepGEMM/nv_dev branch (#9898)
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-05 16:43:42 +08:00
Yukun He
d272f1a9bc
[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 15:44:37 +08:00
HuiGao-NV
2f768b76f8
[https://nvbugs/5715568][fix] Force release torch memory when LLM is destroyed (#10314)
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-05 15:30:18 +08:00
Pengyun Lin
c04cf4334e
[TRTLLM-8242][feat] Add stability tags for serve subcommand (#10012)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-05 14:16:15 +08:00
Yukun He
0937df2c68
[TRTLLM-10185][feat] AutoTuner Cache: Support cache file lock and merge all ranks into one (#10336)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-05 13:44:09 +08:00