mpikulski
7d235cfb23
[TRTLLM-10030][chore] promote SampleState to TypeVar + typing fixes ( #11281 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 16:33:22 +01:00
chenfeiz0326
eae480b713
[ https://nvbugs/5820874 ][fix] Adjust deepgemm tuning buckets to cover larger num_tokens's scope ( #11259 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-05 23:12:38 +08:00
mpikulski
719e82c429
[TRTLLM-10030][perf] beam search (remove GPU sync + fix batching + refactor) ( #11276 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-05 15:33:51 +01:00
Jiayu Chang
e483c7263d
[None][docs] Add CUDA Graph + LoRA in Feature Combination Matrix ( #11187 )
...
Signed-off-by: Jiayu Chang <jiayuc@nvidia.com>
2026-02-05 15:01:59 +01:00
Yuewei Na
0d18b2d7a4
[None][feat] Add priority-based KV cache offload filtering support ( #10751 )
...
Signed-off-by: Yuewei Na <yna@nvidia.com>
Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
2026-02-05 05:22:56 -05:00
Chang Su
9601b17459
[ #11037 ][fix] Fix proto-to-SamplingParams conversion bugs and add gRPC tests ( #11292 )
...
Signed-off-by: Chang Su <chang.s.su@oracle.com>
2026-02-05 05:00:29 -05:00
Yao Yao
d9b936be94
[None][feat] Enhance support for complex models ( #11254 )
...
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>
2026-02-05 17:28:26 +08:00
xxi
4c1d9d0c10
[None][chore] Pass without_comm to cutlass and deepgemm ( #11229 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-05 02:07:59 -05:00
Yechan Kim
36cb5f8c93
[ https://nvbugs/5747920 ][fix] Fix multimodal serve test ( #11296 )
...
Signed-off-by: yechank <161688079+yechank-nvidia@users.noreply.github.com>
2026-02-05 15:12:53 +09:00
xinhe-nv
8447a96c29
[None][chore] Add failed cases into waives.txt ( #11223 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-02-05 00:27:24 -05:00
dongfengy
ada4a3a28e
[ https://nvbugs/5800679 ][fix] Re-enable test after bug fixed ( #11249 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 21:08:27 -08:00
Jin Li
9091a193a8
[ https://nvbugs/5837275 ][fix] Unwaive the failing case that cannot be… ( #11137 )
...
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-02-05 12:52:10 +08:00
Yi Zhang
ada463d15d
[None][fix] Fix comments for kv cache manager v2 ( #11207 )
...
Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>
2026-02-04 23:31:29 -05:00
TensorRT LLM
4adf76d860
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-05 03:42:54 +00:00
dongfengy
0bd4630cd1
[ https://nvbugs/5854860 ][fix] Fix cutedsl argmax on sm120 ( #11181 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 17:15:31 -05:00
dongfengy
ad2d1df4a9
[ https://nvbugs/5849697 ][fix] Refine QA Test List for SM120 ( #11248 )
...
Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com>
2026-02-04 11:59:04 -08:00
Simeng Liu
d9fd8cc951
[ https://nvbugs/5674665 ][fix] Fix accuracy drop in VSWA with KV cache block reuse ( #10875 )
...
Signed-off-by: SimengLiu-nv <simengl@nvidia.com>
2026-02-04 12:46:31 -05:00
Gal Hubara-Agam
767b8dcab3
[None][chore] AutoDeploy: Set nanov3 and superv3 configs to use flashinfer ssm ( #11183 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-04 09:46:15 -08:00
Grzegorz Kwasniewski
d90a8e5700
[TRTLLM-10673][feat] Improved layer classification for sharding ( #10718 )
...
Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com>
2026-02-04 18:06:10 +01:00
Lucas Liebenwein
925d911fc0
[ #10966 ][feat] AutoDeploy: kv cache manager integration [2/2] ( #11149 )
...
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-02-04 09:44:27 -05:00
Xianjie Qiao
e2bd9cce1e
[None][feat] Support disagg slurm jobs rescheduling ( #11218 )
2026-02-04 22:10:36 +08:00
Yueh-Ting (eop) Chen
f6fff18142
[ https://nvbugs/5624818 ][fix] Work around accuracy issue by enforcing paged_context_fmha on Hopper for fmha_v2 ( #11192 )
...
Signed-off-by: eopXD <yuehtingc@nvidia.com>
2026-02-04 19:21:50 +08:00
Zhenhuan Chen
3d8c1a51bd
[None][feat] move some disagg script's env configs from bash to submit.py ( #10223 )
...
Signed-off-by: Zhenhuan Chen <zhenhuanc@nvidia.com>
2026-02-04 04:32:04 -05:00
mpikulski
f0ca62b175
[None][fix] make health_generate work with beam search ( #11097 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-04 09:46:19 +01:00
xxi
02b80bfd58
[TRTLLM-9111][feat] provide the uniform test framework to test all MoE backends ( #11128 )
...
Signed-off-by: xxi <xxi@nvidia.com>
2026-02-04 15:57:56 +08:00
Gal Hubara-Agam
de6931bbfd
[None][fix] Fix selective_state_update perf regression for T=1 decode path ( #11194 )
...
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
2026-02-04 09:01:34 +02:00
chenfeiz0326
04b7db3ab5
[TRTLLM-8263][feat] Add Disagg Perf Tests ( #10912 )
...
Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>
2026-02-04 10:16:11 +08:00
tburt-nv
588db0ed64
[None][chore] bump version to 1.3.0rc3 ( #11238 )
...
Signed-off-by: Tyler Burt <tburt@nvidia.com>
2026-02-04 09:30:45 +08:00
Dmitry Barsukoff
5d522295e9
[None][fix] Set continuous_usage_stats default to False to follow OpenAI protocol ( #10644 )
...
Signed-off-by: Dmitry Barsukoff <riZZZhik@gmail.com>
Co-authored-by: Kanghwan <861393+karljang@users.noreply.github.com>
2026-02-03 16:04:54 -08:00
Taylor Yeonbok Lee
f9e6045f39
[ #11086 ][feat] Optimize Auto Deploy weight loading by preloading weights to CPU ( #11059 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-03 13:23:10 -08:00
Lizhi Zhou
f9c4bdf6cf
[TRTLLM-8921][feat] implement gen-first disagg_service ( #11020 )
...
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-02-03 15:46:11 -05:00
yuanjingx87
8f90330239
[TRTLLM-10019][infra] Move 6 h100 test stage to aihub platform ( #11039 )
...
Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>
2026-02-03 12:42:59 -08:00
mpikulski
710d6ef668
[ https://nvbugs/5739981 ][fix] unwaive tests using opt-125M ( #11100 )
...
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
2026-02-03 15:21:01 +01:00
Chenjie Luo
2532eb5adc
[None][fix] Align kv_scales with modelopt HF checkpoint ( #10745 )
...
Signed-off-by: Chenjie Luo <108829653+cjluo-nv@users.noreply.github.com>
2026-02-03 08:03:42 -05:00
xinhe-nv
20946554f6
[None][chore] Add failed cases into waives.txt ( #11216 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-03 04:15:31 -05:00
Yiqing Yan
a56aaa585e
[TRTLLM-10839][infra] Set rerun report stage UNSTABLE and pipeline SUCCESS in post-merge when there are passed rerun tests ( #11210 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-02-03 15:44:15 +08:00
xinhe-nv
b7767f682f
[None][chore] Add failed cases into waives.txt ( #11202 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-03 02:26:02 -05:00
xinhe-nv
03f51bb767
[None][chore] Add failed cases into waives.txt ( #11193 )
...
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>
2026-02-03 01:46:17 -05:00
Anish Shanbhag
e308eb50f4
[TRTLLM-10803][fix] Fix mocking of HuggingFace downloads in with_mocked_hf_download ( #11200 )
...
Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>
2026-02-02 21:58:15 -08:00
Taylor Yeonbok Lee
304dc6f3c0
[None][chore] Print memory usage before/after accuracy test in CI ( #11155 )
...
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-02-03 00:23:14 -05:00
TensorRT LLM
12b4ebd0ad
[None][infra] Check in most recent lock file from nightly pipeline
...
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-02-03 03:13:38 +00:00
Abby Wei
061d7879d3
[TRTLLM-10307][infra] Add --high-priority in bot help message ( #11133 )
...
Signed-off-by: Abby Wei <mengzew@nvidia.com>
2026-02-03 10:35:05 +08:00
Yiqing Yan
13420178fc
[TRTLLM-10561][infra] Fix jaraco-context and wheel vulnerability ( #10901 )
...
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-02-03 09:54:11 +08:00
Venky
897eb0df2b
[None][doc] Fix GLM4-MoE Eagle support documentation ( #11198 )
...
Signed-off-by: Venky Ganesh <23023424+venkywonka@users.noreply.github.com>
2026-02-02 13:36:09 -08:00
gramnarayan
585fbb2734
[ #10826 ][feat] AutoDeploy: Eagle One-Model [2/n]: Prefill-Only Implementation ( #11073 )
...
Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>
2026-02-02 09:51:10 -08:00
Izzy Putterman
3ef8a4639b
[None][feat] Nemotron H: Eagle3 support ( #11131 )
...
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
2026-02-02 10:26:25 -05:00
Yanchao Lu
cd7762a2fa
[None][test] Fix an invalid test name ( #11195 )
...
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-02-02 23:25:51 +08:00
Rundong Li
f1b85fea4c
[None][feat] Integrate cuda.tile RMS norm kernels ( #9725 )
...
Signed-off-by: Rundong (David) Li <davidli@nvidia.com>
Co-authored-by: Jinman Xie <jinmanx@nvidia.com>
Co-authored-by: Alexey Bylinkin <abylinkin@nvidia.com>
Co-authored-by: Qiqi Xiao <qiqix@nvidia.com>
Co-authored-by: Biao Wang <biaow@nvidia.com>
Co-authored-by: Thomas Schmid <thschmid@nvidia.com>
2026-02-02 19:44:27 +08:00
Mike Iovine
13b0ab9c0e
[None][fix] Fix MTP 1-model sampler ( #10369 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00
Mike Iovine
d9aef94431
[ https://nvbugs/5814914 ][fix] Fix llama sm120 spec dec ( #10765 )
...
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-02-02 16:26:46 +08:00