William Zhang
ca9537e17c
[TRTLLM-10858][feat] Multi-image support for EPD disagg ( #11264 )
...
* Why?
Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.
* What?
This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.
The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-11 20:50:00 -08:00
xinhe-nv
42648734b8
[None][chore] Add failed cases into waives.txt ( #11392 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-11 21:52:29 -05:00
Yukun He
632c039aea
[TRTLLM-10793][feat] Add BOLT compatible build flags for further experimental usage. ( #11297 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 09:54:58 +08:00
Liao Lanyu
58165d5394
[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations ( #11330 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-12 09:18:24 +08:00
Harris Nover
2c4a4c7b94
[None][fix] Fix out-of-bounds array access in kernel factory Get() methods ( #11373 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 19:21:01 -05:00
Harris Nover
2d5ebb3fe8
[None][chore] Merge residual+hidden into layer norm at the end of each NemotronH MTP, and remove a % operation ( #11406 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-11 12:01:36 -05:00
Robin Kobus
7a103035be
[None][fix] Remove overlap scheduler adjustment for max sequence length in create_py_executor function ( #9229 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-11 08:46:25 -08:00
Guoming Zhang
c47ff4da43
[None][feat] Remove the hard code for activation type definition in T… ( #11164 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-02-11 21:50:45 +08:00
Emma Qiao
eed9c16560
[None][infra] Pin the torchao version ( #11444 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-11 17:40:40 +08:00
Yihan Wang
e8b860965b
[None][feat] Initial PR for trtllm-gen attention backend ( #10784 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-11 17:16:52 +08:00
Bo Li
18c992efb1
[None][doc] Update Skip Softmax attention blog. ( #11443 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-11 16:08:16 +08:00
Emma Qiao
8ebd6056fa
[None][infra] Waive failed cases for main on 2/11 ( #11441 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-11 15:25:52 +08:00
Song Rong
3741bb2bb4
[None][chore] Lock FI version to 0.6.3 ( #11371 )
...
Signed-off-by: rosong11 <rosong@nvidia.com>
2026-02-11 14:47:36 +08:00
Bo Li
5ea6888dda
[ https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. ( #11176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-02-11 00:54:40 -05:00
peihengh
a982554190
[ https://nvbugs/5868038 ][fix] Gracefully terminate disagg serving servers to prevent leftover subprocess warnings ( #11395 )
...
Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>
2026-02-10 22:41:37 -05:00
Taylor Yeonbok Lee
860054c859
[ #11203 ][feat] AutoDeploy: Refactor node caching and improve engine build time ( #11250 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-10 13:35:44 -08:00
tburt-nv
f320bc8a9c
[None][chore] Update allowlist 2026-02-10 ( #11426 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-02-10 14:28:41 -05:00
Matt Lefebvre
a7c4005a3d
[None][infra] Use frontend dgx-h100 and b200 slurm platforms ( #11251 )
...
Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>
Signed-off-by: Matt Lefebvre <matthewelefebvre@gmail.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-10 07:51:33 -08:00
mpikulski
411fa9ff87
[TRTLLM-10030][perf] pin host memory and batch sampler setup in beam search ( #11390 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 16:48:36 +01:00
Iman Tabrizian
7d992972b2
[TRTLLM-10273][feat] Move MambaCacheManager from Python to C++ ( #10540 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-10 07:20:56 -08:00
Leslie Fang
d6e49542bd
[ https://nvbugs/5848377 ][fix] fix deepeplowlatency with trtllm moe backend running fp8 DS_R1 ( #11266 )
...
Signed-off-by: leslie-fang25 <leslief@nvidia.com>
Signed-off-by: Leslie Fang <leslief@nvidia.com>
Co-authored-by: Tailing Yuan <yuantailing@gmail.com>
2026-02-10 20:09:00 +08:00
Yiqing Yan
cf02456613
[TRTLLM-9711][infra] Fix the testcase name in timeout xml ( #9781 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-02-10 18:50:42 +08:00
xinhe-nv
c7689df152
[None][chore] Add failed cases into waives.txt ( #11396 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-10 05:50:16 -05:00
xinhe-nv
6e0659dc4d
[None][chore] Add failed cases into waives.txt ( #11363 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-10 05:48:33 -05:00
chenfeiz0326
eac56b793e
[ https://nvbugs/5853720 ][fix] Disable cutedsl argmax kernel to fix perf regression ( #11403 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-10 18:10:38 +08:00
Bo Deng
be88fe33be
[None][fix] fix tinygemm accuracy ( #11411 )
...
Signed-off-by: Bo Deng <deemod@nvidia.com>
2026-02-10 05:09:30 -05:00
mpikulski
adc0d82500
[ https://nvbugs/5791242 ][chore] remove obsolete code ( #11388 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-10 10:55:29 +01:00
Yiqing Yan
21cdc39e83
[TRTLLM-10331][infra] Upload unittest sub results in slurm ( #10834 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-10 17:53:35 +08:00
dominicshanshan
2a4e70b4a9
[None][chore] Unwaive tests after last MI ( #11400 )
...
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-10 17:12:39 +08:00
Emma Qiao
8a74ccc57e
[None][infra] Waive failed cases for main branch on 02/10 ( #11413 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-10 03:21:59 -05:00
Yuxian Qiu
5f4df89109
[None][feat] Fully non-blocking pipeline parallelism executor loop. ( #10349 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 15:43:28 +08:00
Lizhi Zhou
c233692485
[None][doc] add multiple-instances section in disaggregated serving doc ( #11412 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-10 02:31:45 -05:00
Emma Qiao
17cc1c13d6
[None][infra] Enable sparck ci since spark cloud migration is done ( #11407 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-10 01:47:22 -05:00
shuyixiong
c3cdc93211
[TRTLLM-9771][feat] Make update_weights compatible with CUDA Graph ( #11267 )
...
Signed-off-by: Shuyi Xiong <219646547+shuyixiong@users.noreply.github.com>
2026-02-10 01:12:49 -05:00
Jonas Li
8b2dc57823
[None][chore] Mass merge commits from release/1.2.0rc6.post1 branch ( #11384 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
Co-authored-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
Co-authored-by: Iman Tabrizian <10105175+Tabrizian@users.noreply.github.com>
2026-02-10 14:00:42 +08:00
Venky
0c8b5221b4
[TRTC-264][doc] Add CLAUDE.md and AGENTS.md ( #11358 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-09 20:29:58 -08:00
Lucas Liebenwein
a2fb5afecf
[ #11032 ][feat] MLA revisited and GLM 4.7 Flash support ( #11324 )
2026-02-09 23:26:51 -05:00
Venky
d50f010fa9
[TRTC-265][chore] Add CODEOWNERS coverage for serve/ and commands/ directories ( #11359 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-09 22:52:09 -05:00
Emma Qiao
85919d9517
[None][infra] Disable spark stages due to migration of spark cloud ( #11401 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-09 22:31:09 -05:00
Yuan Tong
4fc3644705
[None][fix] Avoid reserved filename on Windows ( #11382 )
...
Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>
2026-02-10 11:22:59 +08:00
JennyLiu
b5508ed75b
[None][test] Add DGX-Spark multinode perf cases including eagle3 ( #11184 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-02-10 10:44:41 +08:00
Mike Iovine
f33086914f
[ https://nvbugs/5843112 ][chore] Unwaive ngram test ( #11320 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
2026-02-09 21:31:29 -05:00
Yuxian Qiu
af68c29d3d
[None][chore] Reduce attention module repeated warnings. ( #11335 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-10 08:58:21 +08:00
Lucas Liebenwein
fe4c690b6c
[ https://nvbugs/5855540 ][fix] AutoDeploy: thread cleanup of eagle test ( #11289 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-09 18:01:12 -05:00
Ziyi Xiong
e76b634251
[TRTLLM-10321][feat] Support different KV cache layout for one-model spec dec ( #10502 )
...
Signed-off-by: ziyixiong-nv <219238287+ziyixiong-nv@users.noreply.github.com>
2026-02-10 05:16:02 +08:00
Mike Iovine
092f4ce774
[ https://nvbugs/5853997 ][chore] Unwaive gpt-oss test ( #11287 )
...
Signed-off-by: Mike Iovine <miovine@nvidia.com>
2026-02-09 16:04:41 -05:00
Patrice Castonguay
c68d916b6f
[None][chore] Unit test for disagg gen cancellation ( #11108 )
...
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-02-09 14:39:02 -05:00
tcherckez-nvidia
ea81a03dd1
[None][chore] update model list ( #11364 )
...
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-02-09 21:27:39 +02:00
Bala Marimuthu
4a743338c3
[None][infra] AutoDeploy: Dump graph IR after every transform ( #11045 )
...
Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>
2026-02-09 10:43:44 -08:00
Lizhi Zhou
e719721a60
[TRTLLM-10866][feat] implement disaggregated harmony chat ( #11336 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 12:09:03 -05:00