Wanli Jiang
722978b837
[TRTLLM-10305][feat] Support customized seq len larger than model config ( #10600 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-16 16:07:36 +08:00
dongfengy
6dfb8d7084
[None][fix] Fix Piecewise Cuda Graph for GPTOSS ( #10631 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-01-16 15:47:34 +08:00
Yukun He
f001c4946d
[ https://nvbugs/5782112 ][fix] Fix hanging issue for MNNVL Allreduce under PP ( #10633 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-16 13:03:36 +08:00
Enwei Zhu
7b8b9ccbaf
[ https://nvbugs/5669671 ][fix] Support GuidedDecoder with sharded logits ( #10698 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-16 11:04:26 +08:00
Lucas Liebenwein
49c6f73554
[None][bug] AutoDeploy: fix regression in kv cache resize memory estimation ( #10726 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-16 09:52:03 +08:00
Lizhi Zhou
93db0d5e18
[TRTLLM-9942][feat] new request states and kvcache transceiver APIs in generation-first disagg ( #10406 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 19:18:21 +08:00
Lizhi Zhou
ff277b591e
[ https://nvbugs/5791830 ][fix] fix pp loop hang caused by i-sending new requests ( #10665 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-15 16:33:55 +08:00
Anish Shanbhag
faa80e73fd
[None][feat] Auto download speculative models from HF for pytorch backend, add speculative_model field alias ( #10099 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-01-14 21:06:07 -08:00
Void
f7de285a82
[None][fix] add quantization check for DeepEP LL low precision combine in new moe comm api ( #10072 )
...
Signed-off-by: Yilin Zhang <18275976+yilin-void@users.noreply.github.com>
2026-01-14 22:15:29 -05:00
彭晋韬(jtao peng)
211c44b951
[None][feat] Adding torch ext API for FusedAddRMSNormQuant kernel ( #9905 )
...
Signed-off-by: jintaop <jintaop@nvidia.com>
2026-01-15 07:29:15 +08:00
Emma Qiao
01083b56bf
[TRTLLM-9849][infra] Update dependencies to 25.12 ( #9818 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Signed-off-by: xxi <xxi@nvidia.com>
Signed-off-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Co-authored-by: xxi <xxi@nvidia.com>
Co-authored-by: xxi <95731198+xxi-nv@users.noreply.github.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-14 21:54:04 +08:00
HuiGao-NV
b10704428d
[ https://nvbugs/5787566 ][fix] Only keep a limited number of performance statistic data ( #10569 )
...
Signed-off-by: Hui Gao <huig@nvidia.com>
2026-01-14 07:53:01 -05:00
Kyungmin Lee
25148d3fee
[None][feat] Support new Transformers RoPE configuration format ( #10636 )
...
Signed-off-by: lkm2835 <lkm2835@gmail.com>
2026-01-14 19:41:27 +09:00
xxi
e9817461ba
[None][chore] improve the readability of log for cutlass can only sup… ( #10630 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:33:45 -05:00
xxi
d8862505b9
[None][chore] enable EPLB for DEEPGEMM ( #10617 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-14 05:28:08 -05:00
jmydurant
e7882d5c74
[None][feat] MiniMax M2 support ( #10532 )
...
Signed-off-by: Mingyang Jiang <13463932+jmydurant@users.noreply.github.com>
2026-01-14 17:38:58 +08:00
Yukun He
15281de799
[None][fix] Reduce host overhead for unified nvfp4 gemm tuning path. ( #10503 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-14 14:26:18 +08:00
Yuxian Qiu
39cefd6125
[None][refactor] Unify the usage of MPIDist and TorchDist. ( #10380 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 14:05:47 +08:00
Leslie Fang
795e690bca
[ https://nvbugs/5753788 ][chore] Padding empty chunk for configurable moe ( #10451 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
2026-01-14 10:42:17 +08:00
Yuxian Qiu
d3f4fbb742
[None][fix] Avoid write-write race for async pp send. ( #10488 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:39:36 +08:00
Yuxian Qiu
2acd03030a
[ https://nvbugs/5781589 ][fix] Implement pp skip forward for all spec workers. ( #10578 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-14 09:36:35 +08:00
Balaram Buddharaju
ccdfa43a6e
[ https://nvbugs/5791900 ][fix] Fix HelixCpMnnvlMemory init with PP ( #10533 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-01-13 15:48:42 -05:00
Frida Hou
bf16fbd86c
[ #9283 ][feat] AutoDeploy: separate rms pattern detection from fusion ( #9969 )
...
Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
2026-01-13 14:57:27 -05:00
Neta Zmora
7b7f1e2ba1
[None][feat] AutoDeploy: refactor memory usage logging ( #8505 )
...
Signed-off-by: Neta Zmora <96238833+nzmora-nvidia@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-01-13 21:03:09 +02:00
benzh-2025
6df2c8a074
[None][feat] add fp4 gemm + allreduce ( #9729 )
...
Signed-off-by: benzh
Signed-off-by: benzh-2025
2026-01-13 21:11:13 +08:00
Tailing Yuan
38296a472b
[None][feat] Layer-wise benchmarks: make model init more general and support weights loading ( #10562 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-13 19:17:03 +08:00
Guoming Zhang
bdaee87895
[TRTLLM-10060][feat] Enable attention dp for Nemotron Super v3. ( #10347 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-01-13 17:13:55 +08:00
Yuxian Qiu
04b112651b
[None][feat] Hang detection for executor loop and worker. ( #10480 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-13 02:34:32 -05:00
xxi
ba1037ca4a
[ https://nvbugs/5762336 ][fix] support to parse the keyword modules_to_not_convert of the HF model config" ( #10527 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-01-12 20:21:01 -05:00
Iman Tabrizian
48b09e5a25
[ https://nvbugs/5689235 ][fix] Fix cancellation+chunked prefill+disagg ( #10111 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-01-12 18:23:26 -05:00
Xianjie Qiao
3a9a00b544
[None][feat] Add ExpertStatistic and DUMMY_ALLREDUCE for configurable_moe ( #10401 )
...
Signed-off-by: Xianjie <5410381+qiaoxj07@users.noreply.github.com>
2026-01-12 14:10:31 +08:00
Yechan Kim
8e0d20d901
[TRTLLM-10195][feat] K-EXAONE support ( #10355 )
...
Signed-off-by: Jaedeok Kim <jaedeokk@nvidia.com>
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Co-authored-by: Jaedeok Kim <jaedeokk@nvidia.com>
2026-01-12 00:29:51 +09:00
Yechan Kim
7295af68ba
[None][fix] Enable AttentionDP on Qwen3-VL and fix test ( #10435 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-01-10 00:13:26 +09:00
Kaiyu Xie
1c69aad850
[TRTLLM-10309] [feat] Optimize qk rope/nope concat for DSA ( #10571 )
...
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-09 09:50:57 -05:00
Yuxian Qiu
80f261ea36
[ https://nvbugs/5622938 ][feat] Run sample_async on extra stream. ( #10215 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-09 18:15:18 +08:00
Chang Liu
78bb245554
[ https://nvbugs/5787453 ][fix] Better align MLA chunking with indexer chunking when chunked prefill enabled for DSV32 ( #10552 )
2026-01-09 00:49:39 -08:00
JadoTu
4c498bfe58
[TRTLLM-9676][fix] Fix mamba_cache_manager when enabling cuda_graph_padding and let test cover this case ( #9873 )
...
Signed-off-by: JadoTu <107457950+JadoTu@users.noreply.github.com>
2026-01-09 14:50:16 +08:00
Yuxian Qiu
afa55c12b6
[None][fix] revert https://github.com/NVIDIA/TensorRT-LLM/pull/10445 . ( #10547 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-08 21:50:04 -05:00
Mike Iovine
4092a87b6f
[ https://nvbugs/5740075 ][fix] Fix sm120 speculation ( #10049 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-01-08 19:55:43 -05:00
Eran Geva
489dd60312
[ #10513 ][fix] AutoDeploy: removed self.mlp_type leftovers from last moe refactor ( #10512 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 14:49:40 -05:00
William Zhang
c0ae6bbdbe
[None][feat] EPD for Qwen3 VL ( #10470 )
...
* Why?
We would like to support EPD disaggregated serving for Qwen3 VL.
* What?
This commit adds such support, and extends existing unit tests for
correctness checks.
Some minor (protected) interface changes had to be made to the
weight mapper as a side-effect.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-01-08 06:45:54 -05:00
Eran Geva
6511dbaea0
[ #10417 ][fix] AutoDepoloy - Reverted to direct computation of minusA ( #10509 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-01-08 13:43:41 +02:00
bhsueh_NV
bea61bb17d
[None][fix] Mistral large 3 few code refine ( #10405 )
...
Signed-off-by: bhsueh <11360707+byshiue@users.noreply.github.com>
2026-01-08 06:38:49 -05:00
Yukun He
09d9878385
[TRTLLM-9661][chore] Further reduce tuning time for cuteDSL nvFP4 dense gemm. ( #10339 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-01-08 10:21:02 +08:00
Ziyi Xiong
7187afe7b9
[ https://nvbugs/5781589 ][fix] Skip spec dec for non-last rank ( #10445 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-01-07 13:55:45 -05:00
tcherckez-nvidia
7e88212d24
[None][bug] fix export for microsoft/Phi-3-medium-128k-instruct ( #10455 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-07 10:30:24 +02:00
Kanghwan
dc32bac9fc
[ #4745 ][fix] Pass lora_params through Qwen2/3 model forward ( #10174 )
...
Signed-off-by: Kanghwan Jang <861393+karljang@users.noreply.github.com>
2026-01-07 15:30:17 +08:00
Fanrong Li
a34aa63685
[ https://nvbugs/5767223 ][feat] add pp support for DeepSeek-v3.2 ( #10449 )
...
Signed-off-by: Fanrong Li <23290157+lfr-0531@users.noreply.github.com>
2026-01-07 12:29:51 +08:00
Zongfei Jing
bb2f883296
[None] [feat] Add test script and raster M for gather fc1 kernel ( #10429 )
...
Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>
2026-01-07 09:31:49 +08:00
Lucas Liebenwein
bb6a3973aa
[ https://nvbugs/5732942 ][fix] AutoDeploy: handle transformers 4.57.1 upgrade fixes ( #10466 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-06 19:55:49 -05:00