Lizhi Zhou
1524c172a4
[ https://nvbugs/5821433 ][fix] WAR for popen in QA env ( #10989 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Balaram Buddharaju
5f8b1b8cbb
[ https://nvbugs/5811087 ][chore] Unwaive Gemma3 27B multimodal test ( #11049 )
...
Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Enwei Zhu
1ba039f044
[ https://nvbugs/5819452 ][ci] Unwaive LLaMA2 7B FP8 case ( #10997 )
...
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
William Zhang
abb8106c01
[ https://nvbugs/5835925 ][fix] Add EPD disagg support for Qwen3 VL MoE ( #10962 )
...
* Why?
Trying to instantiate a `MultimodalEncoder` for a Qwen3 VL MoE model
would fail during weight loading.
* What?
This commit fixes the bug, alongside:
- explicit, intentional support for EPD for Qwen3 VL MoE.
- extends EPD unit tests for Qwen3 VL MoE, albeit with dummy weights.
- unit tests for the weight mapper fixes.
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Jin Li
0ead17bb85
[ https://nvbugs/5800646 ][fix] Fix hang issue by avoid exposing UB buf… ( #10842 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
yingguo-trt
d348dd95a7
[None][feat] support Lyris GB200 and increase disagg test timeout ( #11019 )
...
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Co-authored-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
yufeiwu-nv
fd4e6132e5
[None][test] Fix missing test cases ( #10881 )
...
Signed-off-by: yufeiwu-nv <230315618+yufeiwu-nv@users.noreply.github.com>
Co-authored-by: Larry Xu <197874197+LarryXFly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Stefan Niebler
d50010cd1f
[ https://nvbugs/5769815 ][fix] Fix offset calculation in _are_stop_words when using speculative decoding ( #10854 )
...
Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Lizhi Zhou
6c4e0c3dbe
[ https://nvbugs/5826689 ][fix] replace etcd3 with etcd-sdk-python ( #10886 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Emma Qiao
c659280445
[None][infra] Waive failed cases for release branch on 01/26 ( #10999 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
Pengbo Wang
59f59efb83
[ https://nvbugs/5779536 ][fix] Unwaive DeepSeekR1 nvfp4 pp4 mtp test case ( #10902 )
...
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
JunyiXu-nv
90ea6c1e09
[ https://nvbugs/5804146 ][fix] Enable responses tests and remove ds to… ( #10925 )
...
Signed-off-by: Junyi Xu <219237550+JunyiXu-nv@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-09 23:53:40 +08:00
mpikulski
196d94a419
[TRTLLM-10030][perf] avoid syncs in beam search + other improvements ( #11349 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-09 16:13:58 +01:00
Gal Hubara-Agam
2b60cc181c
[ #10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE ( #11322 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-09 10:07:37 -05:00
Lizhi Zhou
540fb0f29e
[ https://nvbugs/5834212 ][chore] unwaive test_disaggregated_mixed ( #11372 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 09:16:25 -05:00
Robin Kobus
b3e4ddc953
[None][test] Enhance multi-GPU tests for IFB stats ( #11239 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:25:32 +08:00
Robin Kobus
31db399042
[ https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver ( #11354 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:11:45 +08:00
Bo Li
ab73f6ebc6
[None][chore] Add microbench for MoE Comm methods. ( #10317 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-09 02:57:01 -05:00
Yihan Wang
635d65f9fe
[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch ( #11168 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-09 13:57:57 +08:00
Emma Qiao
ad8f6748a3
[None][infra] Waive failed case for main branch on 02/09 ( #11369 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-08 23:05:33 -05:00
TensorRT LLM
fe9192f120
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-09 03:16:42 +00:00
Yanchao Lu
b464c75056
[None][ci] Waive test failures on main 02/08 ( #11365 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-08 22:50:37 +08:00
TensorRT LLM
f7cf25748b
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-08 03:10:28 +00:00
mpikulski
03b38e9fbf
[TRTLLM-10030][perf] avoid sync in PyTorchModelEngine when using beam search ( #11341 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-07 12:31:11 +08:00
William Zhang
ffc0f54959
[ https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker ( #11217 )
...
* Why?
Previously, the mrope tensors' IPC handles would just be forwarded from
encode -> prefill -> decode workers. While this is fine for the
prefill worker, it is not for the decode worker, since by the time it
tries to rebuild those tensors, they could have been garbage collected
due to their refcounts reaching zero in the producer (encode) worker.
This could lead to nasty runtime errors when running E/P/D
disaggregated serving.
* What?
This commit fixes this by having the prefill worker take ownership of
those reconstructed tensors, and stand up new copies for the decode
worker.
Closes: NvBug 5848756
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-06 22:37:42 -05:00
TensorRT LLM
408d610877
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-07 03:13:33 +00:00
Iman Tabrizian
18e611da77
[ https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg ( #11247 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
Gal Hubara-Agam
f9eed3ecc2
[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds ( #11107 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-06 14:55:18 +02:00
Shi Xiaowei
b1268e1b37
[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) ( #11225 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-02-06 07:15:18 -05:00
Bo Li
66caa67357
[None][doc] Add sparse attention docs to index. ( #11342 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 17:53:41 +08:00
Yueh-Ting (eop) Chen
383c5921c2
[ https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation ( #10798 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-06 14:28:47 +08:00
Emma Qiao
09807918c7
[None][infra] Waive failed case and delete the redundent waives ( #11331 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-06 13:56:51 +08:00
Zongfei Jing
df1c1a23d4
[ https://nvbugs/5722629 ] [fix] Remove waive for nvbug 5722629 ( #11278 )
...
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:51:30 -05:00
Chenghao Zhang
9644f024bd
[None][feat] AutoDeploy: add triton backend for causal conv ( #11124 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:33:00 -08:00
Chenghao Zhang
d160439ef9
[ #11148 ][feat] AutoDeploy: Better structure the custom op ( #11152 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:32:22 -08:00
Bo Li
639051e98b
[TRTLLM-10021][docs] Skip Softmax Attention blog and docs. ( #10592 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 12:11:21 +08:00
TensorRT LLM
2e6d9350fa
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-06 03:26:42 +00:00
Yan Chunwei
b98f3fca20
[ https://nvbugs/5744432 ][fix] fix bench script test ( #10483 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-02-06 11:02:24 +08:00
Simeng Liu
86e867297e
[ https://nvbugs/5856637 ][ci] Remove the skip for fixed tests. ( #11285 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-05 21:45:00 -05:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell ( #10130 )
...
Added FP8 cute dsl gemm and batch gemm.
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00
Lucas Liebenwein
712dcd31a9
[ https://nvbugs/5859869 ][fix] remove test waive since test is already deprecated ( #11288 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-05 20:42:43 -05:00
Chuang Zhu
a9d4927235
[TRTLLM-10752][chore] set default val of max_num_tokens_in_buffer as max_seq_len or max_input_len ( #11082 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-02-05 14:54:00 -05:00
Harris Nover
a7494a5ff4
[None][chore] Remove outdated comment in model_engine.py ( #11240 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-05 13:54:46 -05:00
jthomson04
d778b26062
[None][fix] Reduce host memory usage during model loading ( #11119 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-02-05 08:57:40 -08:00
nvyocox
e52eb82780
[ #11234 ][test] Move test_ad_export_onnx to integration examples ( #11260 )
...
Signed-off-by: yocox <yocox@nvidia.com>
2026-02-05 11:32:57 -05:00
Yuxian Qiu
d3d951d837
[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. ( #11256 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-06 00:28:29 +08:00
mpikulski
7d235cfb23
[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes ( #11281 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 16:33:22 +01:00
chenfeiz0326
eae480b713
[ https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope ( #11259 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-05 23:12:38 +08:00
mpikulski
719e82c429
[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) ( #11276 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 15:33:51 +01:00
Jiayu Chang
e483c7263d
[None][docs] Add CUDA Graph + LoRA in Feature Combination Matrix ( #11187 )
...
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-02-05 15:01:59 +01:00