Commit Graph

5008 Commits

Author SHA1 Message Date
Iman Tabrizian
18e611da77
[https://nvbugs/5863392][fix] fix partial reuse disabled for disagg (#11247)
Signed-off-by: Iman Tabrizian <10105175+tabrizian@users.noreply.github.com>
2026-02-06 14:23:51 -05:00
Gal Hubara-Agam
f9eed3ecc2
[None][chore] AutoDeploy update SuperV3 checkpoints and accuracy thresholds (#11107)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara-Agam <96368689+galagam@users.noreply.github.com>
2026-02-06 14:55:18 +02:00
Shi Xiaowei
b1268e1b37
[TRTLLM-9527][feat] Modularization of the transceiver for KV manager v2 (step 4) (#11225)
Signed-off-by: Shixiaowei02 <39303645+Shixiaowei02@users.noreply.github.com>
2026-02-06 07:15:18 -05:00
Bo Li
66caa67357
[None][doc] Add sparse attention docs to index. (#11342)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 17:53:41 +08:00
Yueh-Ting (eop) Chen
383c5921c2
[https://nvbugs/5756028][fix] Fix VSWA initialization with spec-dec and boundary condition in context input preparation (#10798)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-06 14:28:47 +08:00
Emma Qiao
09807918c7
[None][infra] Waive failed case and delete the redundent waives (#11331)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-02-06 13:56:51 +08:00
Zongfei Jing
df1c1a23d4
[https://nvbugs/5722629] [fix] Remove waive for nvbug 5722629 (#11278)
Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
2026-02-06 00:51:30 -05:00
Chenghao Zhang
9644f024bd
[None][feat] AutoDeploy: add triton backend for causal conv (#11124)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:33:00 -08:00
Chenghao Zhang
d160439ef9
[#11148][feat] AutoDeploy: Better structure the custom op (#11152)
Signed-off-by: Chenghao Zhang <211069071+nvchenghaoz@users.noreply.github.com>
2026-02-05 21:32:22 -08:00
Bo Li
639051e98b
[TRTLLM-10021][docs] Skip Softmax Attention blog and docs. (#10592)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-02-06 12:11:21 +08:00
TensorRT LLM
2e6d9350fa [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-06 03:26:42 +00:00
Yan Chunwei
b98f3fca20
[https://nvbugs/5744432][fix] fix bench script test (#10483)
Signed-off-by: Yan Chunwei <328693+Superjomn@users.noreply.github.com>
2026-02-06 11:02:24 +08:00
Simeng Liu
86e867297e
[https://nvbugs/5856637][ci] Remove the skip for fixed tests. (#11285)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-05 21:45:00 -05:00
yifeizhang-c
5521c7b7e7
[TRTLLM-9457][feat] Add cute dsl fp8 gemm for Blackwell (#10130)
Added FP8 cute dsl gemm and batch gemm.

Signed-off-by: Yifei Zhang <219273404+yifeizhang-c@users.noreply.github.com>
2026-02-06 09:49:30 +08:00
Lucas Liebenwein
712dcd31a9
[https://nvbugs/5859869][fix] remove test waive since test is already deprecated (#11288)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-05 20:42:43 -05:00
Chuang Zhu
a9d4927235
[TRTLLM-10752][chore] set default val of max_num_tokens_in_buffer as max_seq_len or max_input_len (#11082)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-02-05 14:54:00 -05:00
Harris Nover
a7494a5ff4
[None][chore] Remove outdated comment in model_engine.py (#11240)
Signed-off-by: Harris Nover <249353502+hnover-nv@users.noreply.github.com>
2026-02-05 13:54:46 -05:00
jthomson04
d778b26062
[None][fix] Reduce host memory usage during model loading (#11119)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
2026-02-05 08:57:40 -08:00
nvyocox
e52eb82780
[#11234][test] Move test_ad_export_onnx to integration examples (#11260)
Signed-off-by: yocox <yocox@nvidia.com>
2026-02-05 11:32:57 -05:00
Yuxian Qiu
d3d951d837
[None][fix] Fix amax to avoid NaN issue in fp8_blockscale_gemm_kernel. (#11256)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-02-06 00:28:29 +08:00
mpikulski
7d235cfb23
[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes (#11281)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 16:33:22 +01:00
chenfeiz0326
eae480b713
[https://nvbugs/5820874][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope (#11259)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-05 23:12:38 +08:00
mpikulski
719e82c429
[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) (#11276)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 15:33:51 +01:00
Jiayu Chang
e483c7263d
[None][docs] Add CUDA Graph + LoRA in Feature Combination Matrix (#11187)
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-02-05 15:01:59 +01:00
Yuewei Na
0d18b2d7a4
[None][feat] Add priority-based KV cache offload filtering support (#10751)
Signed-off-by: Yuewei Na <yna@nvidia.com>
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
2026-02-05 05:22:56 -05:00
Chang Su
9601b17459
[#11037][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests (#11292)
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-02-05 05:00:29 -05:00
Yao Yao
d9b936be94
[None][feat] Enhance support for complex models (#11254)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-02-05 17:28:26 +08:00
xxi
4c1d9d0c10
[None][chore] Pass without_comm to cutlass and deepgemm (#11229)
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-05 02:07:59 -05:00
Yechan Kim
36cb5f8c93
[https://nvbugs/5747920][fix] Fix multimodal serve test (#11296)
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-02-05 15:12:53 +09:00
xinhe-nv
8447a96c29
[None][chore] Add failed cases into waives.txt (#11223)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-05 00:27:24 -05:00
dongfengy
ada4a3a28e
[https://nvbugs/5800679][fix] Re-enable test after bug fixed (#11249)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 21:08:27 -08:00
Jin Li
9091a193a8
[https://nvbugs/5837275][fix] Unwaive the failing case that cannot be… (#11137)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-02-05 12:52:10 +08:00
Yi Zhang
ada463d15d
[None][fix] Fix comments for kv cache manager v2 (#11207)
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-02-04 23:31:29 -05:00
TensorRT LLM
4adf76d860 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-05 03:42:54 +00:00
dongfengy
0bd4630cd1
[https://nvbugs/5854860][fix] Fix cutedsl argmax on sm120 (#11181)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 17:15:31 -05:00
dongfengy
ad2d1df4a9
[https://nvbugs/5849697][fix] Refine QA Test List for SM120 (#11248)
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 11:59:04 -08:00
Simeng Liu
d9fd8cc951
[https://nvbugs/5674665][fix] Fix accuracy drop in VSWA with KV cache block reuse (#10875)
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-04 12:46:31 -05:00
Gal Hubara-Agam
767b8dcab3
[None][chore] AutoDeploy: Set nanov3 and superv3 configs to use flashinfer ssm (#11183)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-04 09:46:15 -08:00
Grzegorz Kwasniewski
d90a8e5700
[TRTLLM-10673][feat] Improved layer classification for sharding (#10718)
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-02-04 18:06:10 +01:00
Lucas Liebenwein
925d911fc0
[#10966][feat] AutoDeploy: kv cache manager integration [2/2] (#11149)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-04 09:44:27 -05:00
Xianjie Qiao
e2bd9cce1e
[None][feat] Support disagg slurm jobs rescheduling (#11218) 2026-02-04 22:10:36 +08:00
Yueh-Ting (eop) Chen
f6fff18142
[https://nvbugs/5624818][fix] Work around accuracy issue by enforcing paged_context_fmha on Hopper for fmha_v2 (#11192)
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-04 19:21:50 +08:00
Zhenhuan Chen
3d8c1a51bd
[None][feat] move some disagg script's env configs from bash to submit.py (#10223)
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-02-04 04:32:04 -05:00
mpikulski
f0ca62b175
[None][fix] make health_generate work with beam search (#11097)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-04 09:46:19 +01:00
xxi
02b80bfd58
[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends (#11128)
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-04 15:57:56 +08:00
Gal Hubara-Agam
de6931bbfd
[None][fix] Fix selective_state_update perf regression for T=1 decode path (#11194)
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-04 09:01:34 +02:00
chenfeiz0326
04b7db3ab5
[TRTLLM-8263][feat] Add Disagg Perf Tests (#10912)
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-04 10:16:11 +08:00
tburt-nv
588db0ed64
[None][chore] bump version to 1.3.0rc3 (#11238)
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-04 09:30:45 +08:00
Dmitry Barsukoff
5d522295e9
[None][fix] Set continuous_usage_stats default to False to follow OpenAI protocol (#10644)
Signed-off-by: Dmitry Barsukoff <riZZZhik@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2026-02-03 16:04:54 -08:00
Taylor Yeonbok Lee
f9e6045f39
[#11086][feat] Optimize Auto Deploy weight loading by preloading weights to CPU (#11059)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-03 13:23:10 -08:00