Thor Johnsen
29e44dd749
[None][fix] Add cacheSaltID property to BlockKey serialization code ( #11457 )
...
Signed-off-by: thorjohnsen <41591019+thorjohnsen@users.noreply.github.com>
2026-02-14 10:22:35 +08:00
Balaram Buddharaju
2989bf5b39
[None][feat] Add new helix kernels for MNNVL-based codepath ( #11433 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-14 09:39:24 +08:00
William Zhang
4debf153d8
[ #11170 ][fix] Fix for mm placeholder counts ( #11461 )
...
* Why?
As reported by #11170 , when a single request contains multiple
messages, and only a subset of those messages include multimodal data,
the previous logic incorrectly adds placeholder tokens to subsequent
messages that do not contain such data.
* What?
This commit fixes this issue, and adds unit tests that would have
caught this.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-14 09:12:03 +08:00
Suyog Gupta
b4e9669d2c
[None][chore] Optimize MOE export by tracing with reduced experts and expanding graph ( #11504 )
...
Signed-off-by: Suyog Gupta <41447211+suyoggupta@users.noreply.github.com>
2026-02-13 16:59:30 -08:00
tburt-nv
f164669c04
[None][chore] Adjust waive to avoid sm parsing ( #11518 )
...
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-02-13 17:38:40 -05:00
Chang Liu
26901e4aa0
[TRTLLM-10612][feat] Initial support of AIGV models in TRTLLM ( #11462 )
...
Signed-off-by: Chang Liu (Enterprise Products) <liuc@nvidia.com>
Signed-off-by: Chang Liu <9713593+chang-l@users.noreply.github.com>
Signed-off-by: Zhenhua Wang <zhenhuaw@nvidia.com>
Co-authored-by: Freddy Qi <junq@nvidia.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Zhenhua Wang <zhenhuaw@nvidia.com>
2026-02-14 06:11:11 +08:00
Pamela Peng
19a3031ecb
[TRTLLM-10329][feat] Fix weight loading for Nemotron 3 models on DGX Spark ( #11405 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-13 15:29:41 -05:00
dpitman-nvda
052fe2f7f6
[None][chore] Update allowlist 2026-02-13 ( #11512 )
...
Signed-off-by: Derek Pitman <dpitman@nvidia.com>
2026-02-14 01:28:26 +08:00
mpikulski
37c53425c1
[TRTLLM-10030][chore] improve assert in sampler ( #11475 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 21:54:28 +08:00
Venky
b67dcd8fef
[None][docs] enable Deepwiki docs ( #11492 )
...
Signed-off-by: Venky <23023424+venkywonka@users.noreply.github.com>
2026-02-13 20:25:08 +08:00
Lizhi Zhou
6837e73219
[ https://nvbugs/5847284 ][fix] fix cuda oom error ( #11219 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-13 19:04:33 +08:00
mpikulski
0ee757e03a
[TRTLLM-10030][chore] use weakref in atexit handler ( #11476 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-13 18:02:29 +08:00
yuanjingx87
ca499d600d
[None][infra] Waive failed test in Post-Merge ( #11491 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-02-12 22:57:17 -08:00
Gal Hubara-Agam
d0e7ba102e
[ #11455 ][fix] Fallback to triton_ssm for nvfp4 quantization ( #11456 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-13 07:38:37 +02:00
Balaram Buddharaju
db35119c7c
[None][chore] Waive test blocking pre-merge ( #11498 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 20:08:14 -08:00
xxi
2565f0f4e4
[TRTLLM-9108][feat] refactor MoE unit tests: add unified ConfigurableMoE test framework ( #11437 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-13 11:05:38 +08:00
dpitman-nvda
45d3792245
[TRTINFRA-7648][chore] Add SECURITY.md file to TensorRT-LLM GitHub ( #11484 )
...
Signed-off-by: Derek Pitman <dpitman@nvidia.com>
2026-02-12 20:46:28 -05:00
Iman Tabrizian
dd74f90914
[ https://nvbugs/5887893 ][fix] Make NVML work with older CUDA driver versions ( #11465 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-12 18:06:47 -05:00
Ludwig Schneider
5130cbd73e
[None][fix] Pre-Allocation for Auto-Tuning NCCL_SYMMETRIC ( #11326 )
...
Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>
2026-02-12 14:31:51 -08:00
Balaram Buddharaju
9c2d23c2e5
[ https://nvbugs/5888410 ][fix] Enable warmup for Helix CP ( #11460 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
2026-02-12 14:24:51 -08:00
tburt-nv
07cd3d4ff2
[None][chore] Bump version to 1.3.0rc4 ( #11485 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-12 16:55:23 -05:00
Yukun He
cb1d8d130f
[TRTLLM-10791][feat] TorchSampler general host time optimization ( #11141 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 18:05:58 +01:00
Pamela Peng
4b2b1d146b
[ https://nvbugs/5810935 ][test] unwaive RTX 6000 pro tests ( #11452 )
...
Signed-off-by: Pamela <179191831+pamelap-nvidia@users.noreply.github.com>
2026-02-12 11:17:45 -05:00
Wanli Jiang
421eb9e39c
[None][feat] Optimize NemotronH model with elementwise and nvfp4 fusion ( #11273 )
...
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-02-12 09:25:31 -05:00
xinhe-nv
ef7830d137
[None][chore] Add failed cases into waives.txt ( #11447 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-12 07:47:25 -05:00
JennyLiu
11d79aa875
[ https://nvbugs/5832481 ][test] Add gpt-oss-120b-Eagle3-throughput case on DGX-Spark ( #11419 )
...
Signed-off-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
Co-authored-by: Jenny Liu <JennyLiu-nv+JennyLiu@users.noreply.github.com>
2026-02-12 05:33:39 -05:00
Tailing Yuan
31cdbdfd72
[ https://nvbugs/5808500 ][chore] Move DeepEPLowLatency tests to machines that support IBGDA with GPU handles ( #11178 )
...
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-02-12 03:58:01 -05:00
mpikulski
d0f3c412ff
[TRTLLM-10030][chore] refactor finish reasons tests ( #11445 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-12 08:32:50 +01:00
xinhe-nv
3c1323442b
[None][chore] Add failed cases into waives.txt ( #11451 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-12 02:31:34 -05:00
Eran Geva
31314b9fed
[None][chore] added AutoDeploy nano_v3_scale.yaml ( #10845 )
...
Signed-off-by: Eran Geva <19514940+MrGeva@users.noreply.github.com>
2026-02-12 01:37:42 -05:00
Lizhi Zhou
219195688c
[None][chore] fix a bug in PR11336 ( #11439 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-12 14:34:14 +08:00
Simeng Liu
12085536df
[TRTLLM-10487][feat] Add user-provided UUID support for multimodal KV cache identification. ( #11075 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-12 00:48:47 -05:00
Mandar Deshpande
936220e746
[None][fix] glm engine build dtype ( #11246 )
...
Signed-off-by: Mandar Deshpande <razzormandar@gmail.com>
2026-02-12 13:27:04 +08:00
Perkz Zheng
e0b11d6ea0
[ https://nvbugs/5804923 ][none] unwaive test ( #11005 )
...
Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>
2026-02-12 13:26:28 +08:00
William Zhang
ca9537e17c
[TRTLLM-10858][feat] Multi-image support for EPD disagg ( #11264 )
...
* Why?
Prior to this commit, we only supported a single multimodal input for
E/P/D disaggregated serving.
* What?
This commit does a minor refactor of the multimodal embedding handles
that cross process boundaries to enable this.
Existing unit tests are updated accordingly to test this.
The `RequestOutput` has its `mm_embedding_handle` replaced in favor of
`disaggregated_params`, addressing a previous TODO.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-11 20:50:00 -08:00
xinhe-nv
42648734b8
[None][chore] Add failed cases into waives.txt ( #11392 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-11 21:52:29 -05:00
Yukun He
632c039aea
[TRTLLM-10793][feat] Add BOLT compatible build flags for further experimental usage. ( #11297 )
...
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
2026-02-12 09:54:58 +08:00
Liao Lanyu
58165d5394
[None][chore] Introduceing an abstract WaitingQueue interface to decouple the request scheduling logic from specific queue implementations ( #11330 )
...
Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com>
Signed-off-by: Lance Liao <108499334+lancelly@users.noreply.github.com>
Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>
2026-02-12 09:18:24 +08:00
Harris Nover
2c4a4c7b94
[None][fix] Fix out-of-bounds array access in kernel factory Get() methods ( #11373 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
2026-02-11 19:21:01 -05:00
Harris Nover
2d5ebb3fe8
[None][chore] Merge residual+hidden into layer norm at the end of each NemotronH MTP, and remove a % operation ( #11406 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-11 12:01:36 -05:00
Robin Kobus
7a103035be
[None][fix] Remove overlap scheduler adjustment for max sequence length in create_py_executor function ( #9229 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-11 08:46:25 -08:00
Guoming Zhang
c47ff4da43
[None][feat] Remove the hard code for activation type definition in T… ( #11164 )
...
Signed-off-by: nv-guomingz <137257613+nv-guomingz@users.noreply.github.com>
2026-02-11 21:50:45 +08:00
Emma Qiao
eed9c16560
[None][infra] Pin the torchao version ( #11444 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-11 17:40:40 +08:00
Yihan Wang
e8b860965b
[None][feat] Initial PR for trtllm-gen attention backend ( #10784 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-11 17:16:52 +08:00
Bo Li
18c992efb1
[None][doc] Update Skip Softmax attention blog. ( #11443 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-11 16:08:16 +08:00
Emma Qiao
8ebd6056fa
[None][infra] Waive failed cases for main on 2/11 ( #11441 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-11 15:25:52 +08:00
Song Rong
3741bb2bb4
[None][chore] Lock FI version to 0.6.3 ( #11371 )
...
Signed-off-by: rosong11 <rosong@nvidia.com>
2026-02-11 14:47:36 +08:00
Bo Li
5ea6888dda
[ https://nvbugs/5810940 ][fix] Update lm_eval to 4.9.10 and re-enable Skip Softmax Attention tests on CI. ( #11176 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
Co-authored-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-02-11 00:54:40 -05:00
peihengh
a982554190
[ https://nvbugs/5868038 ][fix] Gracefully terminate disagg serving servers to prevent leftover subprocess warnings ( #11395 )
...
Signed-off-by: peihu-nv <259410613+peihu-nv@users.noreply.github.com>
2026-02-10 22:41:37 -05:00
Taylor Yeonbok Lee
860054c859
[ #11203 ][feat] AutoDeploy: Refactor node caching and improve engine build time ( #11250 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-10 13:35:44 -08:00