Bala Marimuthu
6157f30b06
[ #11318 ][infra] AutoDeploy: Add fused rope kernel - triton_rope_on_interleaved_qk_inputs ( #11327 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-02-18 02:24:18 +08:00
Bala Marimuthu
1c065fbb3e
[ #11109 ][feat] AutoDeploy: GLM 4.7 Flash Improvements ( #11414 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
Co-authored-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Co-authored-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-02-17 08:43:59 -05:00
jthomson04
2450188808
[None][fix] Better error message for mismatched MPI world size ( #11294 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-02-16 15:37:49 -08:00
Yanchao Lu
cc4511997a
[None][revert] - Revert "[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework" ( #11532 )
2026-02-16 21:23:12 +08:00
Suyog Gupta
f3d784c6f6
[ #10345 ][perf] Enable multi-stream MOE for super. Also adds multi-stream MLA attn ( #11520 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-02-15 15:07:56 -08:00
tcherckez-nvidia
fcb7bea07f
[ #11455 ][bug] Use the torch_dtype set by ModelOpt ( #11525 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-02-15 19:37:59 +02:00
Yi Zhang
361ff36784
[None][feat] Use new index api, add block scale support, fix max_seq_len esitmation, add flash mla support ( #11334 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: yizhang-nv <187001205+yizhang-nv@users.noreply.github.com>
2026-02-15 21:40:54 +08:00
Pengbo Wang
2b4ef3a014
[ https://nvbugs/5815025 ][fix] Fix spec-dec mode flag and related cpp requirements ( #10996 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Yechan Kim
ebd859cf61
[ https://nvbugs/5854419 ][fix] Fix Qwen3-VL-Dense/MoE accuracy drop ( #11134 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Mike Iovine
435ea36977
[None][chore] Add warning about 2-model MTP deprecation ( #11043 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-15 19:57:03 +08:00
Yukun He
ed404f9298
[TRTLLM-10851][feat] Add line_profiler tool for host overhead analysis. ( #11232 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-15 16:18:10 +08:00
Balaram Buddharaju
2989bf5b39
[None][feat] Add new helix kernels for MNNVL-based codepath ( #11433 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-14 09:39:24 +08:00
William Zhang
4debf153d8
[ #11170 ][fix] Fix for mm placeholder counts ( #11461 )
...
* Why?
As reported by #11170 , when a single request contains multiple
messages, and only a subset of those messages include multimodal data,
the previous logic incorrectly adds placeholder tokens to subsequent
messages that do not contain such data.
* What?
This commit fixes this issue, and adds unit tests that would have
caught this.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-14 09:12:03 +08:00
Suyog Gupta
b4e9669d2c
[None][chore] Optimize MOE export by tracing with reduced experts and expanding graph ( #11504 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-02-13 16:59:30 -08:00
Chang Liu
26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM ( #11462 )
...
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00
Pamela Peng
19a3031ecb
[TRTLLM-10329][feat] Fix weight loading for Nemotron 3 models on DGX Spark ( #11405 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-13 15:29:41 -05:00
mpikulski
37c53425c1
[TRTLLM-10030][chore] improve assert in sampler ( #11475 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 21:54:28 +08:00
mpikulski
0ee757e03a
[TRTLLM-10030][chore] use weakref in atexit handler ( #11476 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 18:02:29 +08:00
Gal Hubara-Agam
d0e7ba102e
[ #11455 ][fix] Fallback to triton_ssm for nvfp4 quantization ( #11456 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-13 07:38:37 +02:00
xxi
2565f0f4e4
[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework ( #11437 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-13 11:05:38 +08:00
Ludwig Schneider
5130cbd73e
[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC ( #11326 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-02-12 14:31:51 -08:00
Balaram Buddharaju
9c2d23c2e5
[ https://nvbugs/5888410 ][fix] Enable warmup for Helix CP ( #11460 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 14:24:51 -08:00
tburt-nv
07cd3d4ff2
[None][chore] Bump version to 1.3.0rc4 ( #11485 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-12 16:55:23 -05:00
Yukun He
cb1d8d130f
[TRTLLM-10791][feat] TorchSampler general host time optimization ( #11141 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 18:05:58 +01:00
Wanli Jiang
421eb9e39c
[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion ( #11273 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-02-12 09:25:31 -05:00
Lizhi Zhou
219195688c
[None][chore] fix a bug in PR11336 ( #11439 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-12 14:34:14 +08:00
Simeng Liu
12085536df
[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. ( #11075 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-12 00:48:47 -05:00
William Zhang
ca9537e17c
[TRTLLM-10858][feat] Multi-image support for EPD disagg ( #11264 )
...
* Why?
Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.
* What?
This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.
The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-11 20:50:00 -08:00
Liao Lanyu
58165d5394
[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations ( #11330 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-12 09:18:24 +08:00
Harris Nover
2d5ebb3fe8
[None][chore] Merge residual+hidden into layer norm at the end of each NemotronH MTP, and remove a % operation ( #11406 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-11 12:01:36 -05:00
Robin Kobus
7a103035be
[None][fix] Remove overlap scheduler adjustment for max sequence length in create_py_executor function ( #9229 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-11 08:46:25 -08:00
Guoming Zhang
c47ff4da43
[None][feat] Remove the hard code for activation type definition in T… ( #11164 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-02-11 21:50:45 +08:00
Yihan Wang
e8b860965b
[None][feat] Initial PR for trtllm-gen attention backend ( #10784 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-11 17:16:52 +08:00
Bo Li
5ea6888dda
[ https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. ( #11176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-02-11 00:54:40 -05:00
Taylor Yeonbok Lee
860054c859
[ #11203 ][feat] AutoDeploy: Refactor node caching and improve engine build time ( #11250 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-10 13:35:44 -08:00
mpikulski
411fa9ff87
[TRTLLM-10030][perf] pin host memory and batch sampler setup in beam search ( #11390 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 16:48:36 +01:00
Iman Tabrizian
7d992972b2
[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ ( #10540 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-10 07:20:56 -08:00
Leslie Fang
d6e49542bd
[ https://nvbugs/5848377 ][fix] fix deepeplowlatency with trtllm moe backend running fp8 DS_R1 ( #11266 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Tailing Yuan <yuantailing@gmail.com>
2026-02-10 20:09:00 +08:00
chenfeiz0326
eac56b793e
[ https://nvbugs/5853720 ][fix] Disable cutedsl argmax kernel to fix perf regression ( #11403 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-10 18:10:38 +08:00
mpikulski
adc0d82500
[ https://nvbugs/5791242 ][chore] remove obsolete code ( #11388 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 10:55:29 +01:00
Yuxian Qiu
5f4df89109
[None][feat] Fully non-blocking pipeline parallelism executor loop. ( #10349 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 15:43:28 +08:00
shuyixiong
c3cdc93211
[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph ( #11267 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2026-02-10 01:12:49 -05:00
Jonas Li
8b2dc57823
[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch ( #11384 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2026-02-10 14:00:42 +08:00
Lucas Liebenwein
a2fb5afecf
[ #11032 ][feat] MLA revisited and GLM 4.7 Flash support ( #11324 )
2026-02-09 23:26:51 -05:00
Yuan Tong
4fc3644705
[None][fix] Avoid reserved filename on Windows ( #11382 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-02-10 11:22:59 +08:00
Yuxian Qiu
af68c29d3d
[None][chore] Reduce attention module repeated warnings. ( #11335 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 08:58:21 +08:00
Ziyi Xiong
e76b634251
[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec ( #10502 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-02-10 05:16:02 +08:00
Bala Marimuthu
4a743338c3
[None][infra] AutoDeploy: Dump graph IR after every transform ( #11045 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-02-09 10:43:44 -08:00
Lizhi Zhou
e719721a60
[TRTLLM-10866][feat] implement disaggregated harmony chat ( #11336 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 12:09:03 -05:00
Guiju Zhang
c37531c3f7
[TRTLLM-10669][fix] Fix Eagle3 draft model weight loading for throughput checkpoint ( #11010 )
...
Signed-off-by: Guiju Zhang <7135567+cascade812@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00