Pamela Peng
19a3031ecb
[TRTLLM-10329][feat] Fix weight loading for Nemotron 3 models on DGX Spark ( #11405 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-13 15:29:41 -05:00
mpikulski
37c53425c1
[TRTLLM-10030][chore] improve assert in sampler ( #11475 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 21:54:28 +08:00
mpikulski
0ee757e03a
[TRTLLM-10030][chore] use weakref in atexit handler ( #11476 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 18:02:29 +08:00
Gal Hubara-Agam
d0e7ba102e
[ #11455 ][fix] Fallback to triton_ssm for nvfp4 quantization ( #11456 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-13 07:38:37 +02:00
xxi
2565f0f4e4
[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework ( #11437 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-13 11:05:38 +08:00
Ludwig Schneider
5130cbd73e
[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC ( #11326 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-02-12 14:31:51 -08:00
Balaram Buddharaju
9c2d23c2e5
[ https://nvbugs/5888410 ][fix] Enable warmup for Helix CP ( #11460 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 14:24:51 -08:00
tburt-nv
07cd3d4ff2
[None][chore] Bump version to 1.3.0rc4 ( #11485 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-12 16:55:23 -05:00
Yukun He
cb1d8d130f
[TRTLLM-10791][feat] TorchSampler general host time optimization ( #11141 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 18:05:58 +01:00
Wanli Jiang
421eb9e39c
[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion ( #11273 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-02-12 09:25:31 -05:00
Lizhi Zhou
219195688c
[None][chore] fix a bug in PR11336 ( #11439 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-12 14:34:14 +08:00
Simeng Liu
12085536df
[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. ( #11075 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-12 00:48:47 -05:00
William Zhang
ca9537e17c
[TRTLLM-10858][feat] Multi-image support for EPD disagg ( #11264 )
...
* Why?
Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.
* What?
This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.
The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-11 20:50:00 -08:00
Liao Lanyu
58165d5394
[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations ( #11330 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-12 09:18:24 +08:00
Harris Nover
2d5ebb3fe8
[None][chore] Merge residual+hidden into layer norm at the end of each NemotronH MTP, and remove a % operation ( #11406 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-11 12:01:36 -05:00
Robin Kobus
7a103035be
[None][fix] Remove overlap scheduler adjustment for max sequence length in create_py_executor function ( #9229 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-11 08:46:25 -08:00
Guoming Zhang
c47ff4da43
[None][feat] Remove the hard code for activation type definition in T… ( #11164 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-02-11 21:50:45 +08:00
Yihan Wang
e8b860965b
[None][feat] Initial PR for trtllm-gen attention backend ( #10784 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-11 17:16:52 +08:00
Bo Li
5ea6888dda
[ https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. ( #11176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-02-11 00:54:40 -05:00
Taylor Yeonbok Lee
860054c859
[ #11203 ][feat] AutoDeploy: Refactor node caching and improve engine build time ( #11250 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-10 13:35:44 -08:00
mpikulski
411fa9ff87
[TRTLLM-10030][perf] pin host memory and batch sampler setup in beam search ( #11390 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 16:48:36 +01:00
Iman Tabrizian
7d992972b2
[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ ( #10540 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-10 07:20:56 -08:00
Leslie Fang
d6e49542bd
[ https://nvbugs/5848377 ][fix] fix deepeplowlatency with trtllm moe backend running fp8 DS_R1 ( #11266 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Tailing Yuan <yuantailing@gmail.com>
2026-02-10 20:09:00 +08:00
chenfeiz0326
eac56b793e
[ https://nvbugs/5853720 ][fix] Disable cutedsl argmax kernel to fix perf regression ( #11403 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-10 18:10:38 +08:00
mpikulski
adc0d82500
[ https://nvbugs/5791242 ][chore] remove obsolete code ( #11388 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 10:55:29 +01:00
Yuxian Qiu
5f4df89109
[None][feat] Fully non-blocking pipeline parallelism executor loop. ( #10349 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 15:43:28 +08:00
shuyixiong
c3cdc93211
[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph ( #11267 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2026-02-10 01:12:49 -05:00
Jonas Li
8b2dc57823
[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch ( #11384 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2026-02-10 14:00:42 +08:00
Lucas Liebenwein
a2fb5afecf
[ #11032 ][feat] MLA revisited and GLM 4.7 Flash support ( #11324 )
2026-02-09 23:26:51 -05:00
Yuan Tong
4fc3644705
[None][fix] Avoid reserved filename on Windows ( #11382 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-02-10 11:22:59 +08:00
Yuxian Qiu
af68c29d3d
[None][chore] Reduce attention module repeated warnings. ( #11335 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 08:58:21 +08:00
Ziyi Xiong
e76b634251
[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec ( #10502 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-02-10 05:16:02 +08:00
Bala Marimuthu
4a743338c3
[None][infra] AutoDeploy: Dump graph IR after every transform ( #11045 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-02-09 10:43:44 -08:00
Lizhi Zhou
e719721a60
[TRTLLM-10866][feat] implement disaggregated harmony chat ( #11336 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 12:09:03 -05:00
Guiju Zhang
c37531c3f7
[TRTLLM-10669][fix] Fix Eagle3 draft model weight loading for throughput checkpoint ( #11010 )
...
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
William Zhang
abb8106c01
[ https://nvbugs/5835925 ][fix] Add EPD disagg support for Qwen3 VL MoE ( #10962 )
...
* Why?
Trying to instantiate a `MultimodalEncoder` for a Qwen3 VL MoE model
would fail during weight loading.
* What?
This commit fixes the bug, alongside:
- explicit, intentional support for EPD for Qwen3 VL MoE.
- extends EPD unit tests for Qwen3 VL MoE, albeit with dummy weights.
- unit tests for the weight mapper fixes.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Jin Li
0ead17bb85
[ https://nvbugs/5800646 ][fix] Fix hang issue by avoid exposing UB buf… ( #10842 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Stefan Niebler
d50010cd1f
[ https://nvbugs/5769815 ][fix] Fix offset calculation in _are_stop_words when using speculative decoding ( #10854 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Lizhi Zhou
6c4e0c3dbe
[ https://nvbugs/5826689 ][fix] replace etcd3 with etcd-sdk-python ( #10886 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
mpikulski
196d94a419
[TRTLLM-10030][perf] avoid syncs in beam search + other improvements ( #11349 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-09 16:13:58 +01:00
Gal Hubara-Agam
2b60cc181c
[ #10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE ( #11322 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-09 10:07:37 -05:00
Robin Kobus
31db399042
[ https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver ( #11354 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:11:45 +08:00
mpikulski
03b38e9fbf
[TRTLLM-10030][perf] avoid sync in PyTorchModelEngine when using beam search ( #11341 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-07 12:31:11 +08:00
William Zhang
ffc0f54959
[ https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker ( #11217 )
...
* Why?
Previously, the mrope tensors' IPC handles would just be forwarded from
encode -> prefill -> decode workers. While this is fine for the
prefill worker, it is not for the decode worker, since by the time it
tries to rebuild those tensors, they could have been garbage collected
due to their refcounts reaching zero in the producer (encode) worker.
This could lead to nasty runtime errors when running E/P/D
disaggregated serving.
* What?
This commit fixes this by having the prefill worker take ownership of
those reconstructed tensors, and stand up new copies for the decode
worker.
Closes: NvBug 5848756
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-06 22:37:42 -05:00
Iman Tabrizian
18e611da77
[ https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg ( #11247 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
Shi Xiaowei
b1268e1b37
[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) ( #11225 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-02-06 07:15:18 -05:00
Yueh-Ting (eop) Chen
383c5921c2
[ https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation ( #10798 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-06 14:28:47 +08:00
Chenghao Zhang
9644f024bd
[None][feat] AutoDeploy: add triton backend for causal conv ( #11124 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:33:00 -08:00
Chenghao Zhang
d160439ef9
[ #11148 ][feat] AutoDeploy: Better structure the custom op ( #11152 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:32:22 -08:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell ( #10130 )
...
Added FP8 cute dsl gemm and batch gemm.
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00