heyuhhh
5f07c4e5e1
Merge 8d858f912e into 6df2c8a074
2026-01-13 21:15:46 +08:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading ( #10562 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Void
7d16f3a28b
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-13 17:16:22 +08:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. ( #10347 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
JunyiXu-nv
e291a834db
[TRTLLM-8462][feat] Support GET/DELETE v1/responses/{response_id} ( #9937 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
2026-01-13 03:57:14 -05:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. ( #10480 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
heyuhhh
8d858f912e
Merge branch 'main' into user/yuhangh/support_export_data_in_eval
2026-01-13 10:01:12 +08:00
xxi
ba1037ca4a
[ https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" ( #10527 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Gal Hubara-Agam
18a33764b5
[None][chore] Print correct backend name in benchmark report ( #10597 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-01-12 14:46:00 -05:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe ( #10401 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
yuhangh
ac196c70ad
update dump logic
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-10 04:27:40 +00:00
yuhangh
230022890b
pre-commit fix
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-10 04:27:39 +00:00
yuhangh
c9e518cd24
Use output_dir and save both of prompt ids and text
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-10 04:27:35 +00:00
yuhangh
cbc67b7c76
Support to export data in trtllm-eval
...
Signed-off-by: yuhangh <58161490+heyuhhh@users.noreply.github.com>
2026-01-10 04:26:43 +00:00
Faraz
fdbdbba540
[ https://nvbugs/5752687 ][fix] Choose register model config over root config for VLM ( #10553 )
...
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
2026-01-09 12:10:52 -05:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test ( #10435 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA ( #10571 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Yuxian Qiu
80f261ea36
[ https://nvbugs/5622938 ][feat] Run sample_async on extra stream. ( #10215 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-09 18:15:18 +08:00
Chang Liu
78bb245554
[ https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 ( #10552 )
2026-01-09 00:49:39 -08:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case ( #9873 )
...
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Yuxian Qiu
afa55c12b6
[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . ( #10547 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 21:50:04 -05:00
Mike Iovine
4092a87b6f
[ https://nvbugs/5740075 ][fix] Fix sm120 speculation ( #10049 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
Eran Geva
489dd60312
[ #10513 ][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor ( #10512 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 14:49:40 -05:00
mpikulski
e0331297a6
[TRTLLM-9522][fix] broken cast ( #9975 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-01-08 06:47:39 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL ( #10470 )
...
* Why?
We would like to support EPD disaggregated serving for Qwen3 VL.
* What?
This commit adds such support, and extends existing unit tests for
correctness checks.
Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
Eran Geva
6511dbaea0
[ #10417 ][fix] AutoDepoloy - Reverted to direct computation of minusA ( #10509 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 13:43:41 +02:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine ( #10405 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yiqing Yan
dc6b743fb6
[None][chore] Bump version to 1.2.0rc8 ( #10542 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-08 04:51:44 -05:00
Yukun He
09d9878385
[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. ( #10339 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-08 10:21:02 +08:00
Ziyi Xiong
7187afe7b9
[ https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank ( #10445 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-01-07 13:55:45 -05:00
tcherckez-nvidia
7e88212d24
[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct ( #10455 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-07 10:30:24 +02:00
Kanghwan
dc32bac9fc
[ #4745 ][fix] Pass lora_params through Qwen2/3 model forward ( #10174 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2026-01-07 15:30:17 +08:00
Fanrong Li
a34aa63685
[ https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 ( #10449 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel ( #10429 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[ https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes ( #10466 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00
Lizhi Zhou
6a4bebcd01
[None][chore] remove redundant retries while binding to arbitrary port ( #10452 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-06 10:39:15 -05:00
Kaiyu Xie
2eaabd7461
[None] [fix] Fix undefined tokens_per_block ( #10438 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-06 02:42:37 -05:00
Karthik
617f728903
[ #8460 ][feat] Revive and simplify Model Explorer visualization integration ( #10150 )
...
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 22:15:25 -05:00
Xiao Xuan
46f035befe
[ #2511 ][fix] eagle: qwen2 capture hidden states ( #10091 )
...
Signed-off-by: SpicyNoodle <522169030@qq.com>
2026-01-05 21:46:41 -05:00
alel
6b8ae6fa81
[None][feat] CuteDSL MOE FC1 Enhancement ( #10088 )
...
Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>
2026-01-06 09:30:43 +08:00
JadoTu
82aaf98070
[None][feat] add the eos tokens in generation config to stop words in the sampler ( #10389 )
...
Signed-off-by: jiant <107457950+JadoTu@users.noreply.github.com>
2026-01-06 09:24:03 +08:00
Karthik
4e50cb5708
[ #10170 ][fix] Add export patch for GraniteMoe MoE models to enable torch.export compatibility ( #10169 )
...
Signed-off-by: Karthik Vetrivel <kvetrivel@nvidia.com>
2026-01-05 16:13:45 -05:00
Grzegorz Kwasniewski
ea380ff45c
[TRTLLM-9767][feat] Fixed recursive node traversals ( #10379 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-01-05 18:42:06 +02:00
Mike Iovine
db2614ef10
[ https://nvbugs/5772414 ][fix] Fix draft token tree depth=1 corner case ( #10385 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:20:14 +01:00
Mike Iovine
bedfff4f00
[ https://nvbugs/5772521 ][fix] Fix draft token tree chain crash ( #10386 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-01-05 17:18:44 +01:00
Anthony Chang
225d3a9001
[None][perf] TRTLLM MoE maps to lower tuning buckets when ep>1 ( #9998 )
...
Signed-off-by: Anthony Chang <27950904+rosenrodt@users.noreply.github.com>
2026-01-05 17:16:12 +01:00
Balaram Buddharaju
a792c23dcf
[TRTLLM-9465][fix] Swap TP-CP grouping order ( #10350 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-05 20:08:03 +08:00