Gal Hubara-Agam
2b60cc181c
[ #10780 ][feat] AutoDeploy: Support per-expert scales in FP8 and NVFP4 MoE ( #11322 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-09 10:07:37 -05:00
Lizhi Zhou
540fb0f29e
[ https://nvbugs/5834212 ][chore] unwaive test_disaggregated_mixed ( #11372 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-09 09:16:25 -05:00
Robin Kobus
b3e4ddc953
[None][test] Enhance multi-GPU tests for IFB stats ( #11239 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:25:32 +08:00
Robin Kobus
31db399042
[ https://nvbugs/5829097 ][fix] Disaggregated serving: Only send finished context requests to the KV cache transceiver ( #11354 )
...
Signed-off-by: Robin Kobus <19427718+Funatiq@users.noreply.github.com>
2026-02-09 17:11:45 +08:00
Bo Li
ab73f6ebc6
[None][chore] Add microbench for MoE Comm methods. ( #10317 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-09 02:57:01 -05:00
Yihan Wang
635d65f9fe
[None][chore] Move test_trtllm_flashinfer_symbol_collision.py to tests/unittest/_torch ( #11168 )
...
Signed-off-by: Yihan Wang <yihwang@nvidia.com>
2026-02-09 13:57:57 +08:00
Emma Qiao
ad8f6748a3
[None][infra] Waive failed case for main branch on 02/09 ( #11369 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-08 23:05:33 -05:00
TensorRT LLM
fe9192f120
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-09 03:16:42 +00:00
Yanchao Lu
b464c75056
[None][ci] Waive test failures on main 02/08 ( #11365 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-08 22:50:37 +08:00
TensorRT LLM
f7cf25748b
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-08 03:10:28 +00:00
mpikulski
03b38e9fbf
[TRTLLM-10030][perf] avoid sync in PyTorchModelEngine when using beam search ( #11341 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-07 12:31:11 +08:00
William Zhang
ffc0f54959
[ https://nvbugs/5848756 ][fix] Re-take ownership of mrope tensors in prefill worker ( #11217 )
...
* Why?
Previously, the mrope tensors' IPC handles would just be forwarded from
encode -> prefill -> decode workers. While this is fine for the
prefill worker, it is not for the decode worker, since by the time it
tries to rebuild those tensors, they could have been garbage collected
due to their refcounts reaching zero in the producer (encode) worker.
This could lead to nasty runtime errors when running E/P/D
disaggregated serving.
* What?
This commit fixes this by having the prefill worker take ownership of
those reconstructed tensors, and stand up new copies for the decode
worker.
Closes: NvBug 5848756
Signed-off-by: William Zhang <133824995+2ez4bz@users.noreply.github.com>
2026-02-06 22:37:42 -05:00
TensorRT LLM
408d610877
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-07 03:13:33 +00:00
Iman Tabrizian
18e611da77
[ https://nvbugs/5863392 ][fix] fix partial reuse disabled for disagg ( #11247 )
...
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
Gal Hubara-Agam
f9eed3ecc2
[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds ( #11107 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-06 14:55:18 +02:00
Shi Xiaowei
b1268e1b37
[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) ( #11225 )
...
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-02-06 07:15:18 -05:00
Bo Li
66caa67357
[None][doc] Add sparse attention docs to index. ( #11342 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 17:53:41 +08:00
Yueh-Ting (eop) Chen
383c5921c2
[ https://nvbugs/5756028 ][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation ( #10798 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-06 14:28:47 +08:00
Emma Qiao
09807918c7
[None][infra] Waive failed case and delete the redundent waives ( #11331 )
...
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-06 13:56:51 +08:00
Zongfei Jing
df1c1a23d4
[ https://nvbugs/5722629 ] [fix] Remove waive for nvbug 5722629 ( #11278 )
...
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:51:30 -05:00
Chenghao Zhang
9644f024bd
[None][feat] AutoDeploy: add triton backend for causal conv ( #11124 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:33:00 -08:00
Chenghao Zhang
d160439ef9
[ #11148 ][feat] AutoDeploy: Better structure the custom op ( #11152 )
...
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:32:22 -08:00
Bo Li
639051e98b
[TRTLLM-10021][docs] Skip Softmax Attention blog and docs. ( #10592 )
...
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 12:11:21 +08:00
TensorRT LLM
2e6d9350fa
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-06 03:26:42 +00:00
Yan Chunwei
b98f3fca20
[ https://nvbugs/5744432 ][fix] fix bench script test ( #10483 )
...
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-02-06 11:02:24 +08:00
Simeng Liu
86e867297e
[ https://nvbugs/5856637 ][ci] Remove the skip for fixed tests. ( #11285 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-05 21:45:00 -05:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell ( #10130 )
...
Added FP8 cute dsl gemm and batch gemm.
Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00
Lucas Liebenwein
712dcd31a9
[ https://nvbugs/5859869 ][fix] remove test waive since test is already deprecated ( #11288 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-05 20:42:43 -05:00
Chuang Zhu
a9d4927235
[TRTLLM-10752][chore] set default val of max_num_tokens_in_buffer as max_seq_len or max_input_len ( #11082 )
...
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-02-05 14:54:00 -05:00
Harris Nover
a7494a5ff4
[None][chore] Remove outdated comment in model_engine.py ( #11240 )
...
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-05 13:54:46 -05:00
jthomson04
d778b26062
[None][fix] Reduce host memory usage during model loading ( #11119 )
...
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-02-05 08:57:40 -08:00
nvyocox
e52eb82780
[ #11234 ][test] Move test_ad_export_onnx to integration examples ( #11260 )
...
Signed-off-by: yocox <yocox@nvidia.com>
2026-02-05 11:32:57 -05:00
Yuxian Qiu
d3d951d837
[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. ( #11256 )
...
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-06 00:28:29 +08:00
mpikulski
7d235cfb23
[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes ( #11281 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 16:33:22 +01:00
chenfeiz0326
eae480b713
[ https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope ( #11259 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-05 23:12:38 +08:00
mpikulski
719e82c429
[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) ( #11276 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 15:33:51 +01:00
Jiayu Chang
e483c7263d
[None][docs] Add CUDA Graph + LoRA in Feature Combination Matrix ( #11187 )
...
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-02-05 15:01:59 +01:00
Yuewei Na
0d18b2d7a4
[None][feat] Add priority-based KV cache offload filtering support ( #10751 )
...
Signed-off-by: Yuewei Na <yna@nvidia.com>
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
2026-02-05 05:22:56 -05:00
Chang Su
9601b17459
[ #11037 ][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests ( #11292 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-02-05 05:00:29 -05:00
Yao Yao
d9b936be94
[None][feat] Enhance support for complex models ( #11254 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-02-05 17:28:26 +08:00
xxi
4c1d9d0c10
[None][chore] Pass without_comm to cutlass and deepgemm ( #11229 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-05 02:07:59 -05:00
Yechan Kim
36cb5f8c93
[ https://nvbugs/5747920 ][fix] Fix multimodal serve test ( #11296 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-02-05 15:12:53 +09:00
xinhe-nv
8447a96c29
[None][chore] Add failed cases into waives.txt ( #11223 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-05 00:27:24 -05:00
dongfengy
ada4a3a28e
[ https://nvbugs/5800679 ][fix] Re-enable test after bug fixed ( #11249 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 21:08:27 -08:00
Jin Li
9091a193a8
[ https://nvbugs/5837275 ][fix] Unwaive the failing case that cannot be… ( #11137 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-02-05 12:52:10 +08:00
Yi Zhang
ada463d15d
[None][fix] Fix comments for kv cache manager v2 ( #11207 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-02-04 23:31:29 -05:00
TensorRT LLM
4adf76d860
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-05 03:42:54 +00:00
dongfengy
0bd4630cd1
[ https://nvbugs/5854860 ][fix] Fix cutedsl argmax on sm120 ( #11181 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 17:15:31 -05:00
dongfengy
ad2d1df4a9
[ https://nvbugs/5849697 ][fix] Refine QA Test List for SM120 ( #11248 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 11:59:04 -08:00
Simeng Liu
d9fd8cc951
[ https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse ( #10875 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-04 12:46:31 -05:00