Commit Graph

4844 Commits

Author SHA1 Message Date
Emma Qiao
c761b68481
[None][infra] Waive failed cases for main on 01/27 (#11017)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-27 15:24:54 +08:00
zhhuang-nv
ca9f70f78c
[https://nvbugs/5612438][fix] Add timeout for SeedOSS test (#8683)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2026-01-27 15:22:21 +08:00
Tailing Yuan
5553391c5e
[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-27 13:18:34 +08:00
Wanli Jiang
4a206351bb
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-27 13:04:40 +08:00
TensorRT LLM
da43a28b01 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-27 03:23:36 +00:00
ameynaik-hub
df8be0c50c
[TRTLLM-10276][feat] Integrate cutedsl argmax kernel (#10476)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-26 22:08:47 -05:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
tcherckez-nvidia
43b8a5561c
[None][chore] update AD model list (#10981)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-26 16:49:50 +02:00
Lucas Liebenwein
00f341be49
[#8982][feat] AutoDeploy attention dp support (#10728)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references (#10550)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Pengyun Lin
ce37e27066
[#10614][fix] gpt_oss first iteration streaming in trtllm-serve (#10808)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-26 20:53:11 +08:00
Pengbo Wang
5d7a5e6800
[https://nvbugs/5779536][fix] Cherry-pick #10855: Unwaive Llama 3.3 related multi GPU tests (#10942)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-26 05:40:29 -05:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Emma Qiao
a3a3ceb17f
[None][infra] Waive failed case for main branch on 01/26 (#10994)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-26 03:20:53 -05:00
xinhe-nv
d3406cb515
[None][chore] Add failed cases into waives.txt (#10976)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-26 02:23:05 -05:00
yingguo-trt
c8f1745a6e
[https://nvbugs/5661741][feat] Add 250K-token NVFP4 MoE + PDL regression tests (#10911)
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-26 01:48:29 -05:00
xinhe-nv
2d8245d125
[None][chore] Add failed cases into waives.txt (#10974)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-26 00:33:50 -05:00
TensorRT LLM
d2b5954aea [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-26 03:26:18 +00:00
Enwei Zhu
ffab217974
[None][fix] Fix CuteDSL MoE unittest (#10983)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-26 08:34:17 +08:00
Yanchao Lu
45d7022cc3
[None][test] Waive failed tests on main 1/25 (#10984)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-26 00:32:02 +08:00
Enwei Zhu
72ef732bcf
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-25 21:02:30 +08:00
Pengyun Lin
fd7fd8c39d [https://nvbugs/5747938][infra] Unwaive trtllm serve example test (#10820)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
dominicshanshan
c98c286c0f [https://nvbugs/5814203][fix] Fix port 8000 being used issue in stress test. (#10756)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yanchao Lu
ae58a7ed20 [None][chore] Revert NVIDIA/TensorRT-LLM#10819 (#10870)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
bcd2dc490c [None][test] Update case for release (#10811)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yanchao Lu
18f63dfcec [None][chore] Reduce tedious logs (#10819)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
44aa6c3b8e [None][infra] Waive failed cases for release branch on 01/20 (#10828)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
mpikulski
0f7ec033f7 [https://nvbugs/5791242][fix] workaround for flashinfer.sampling.sampling_from_logits (#10713)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Patrice Castonguay
8959c41d8b [https://nvbugs/5748664][fix] Increasing disagg acc test timeout (#10764)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
4ebc1b1596 [None][test] Update test case for release (#10763)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
ruodil
4df0ca8bd1 [None][test] modify ctx config in 128k8k disagg cases (#10779)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
af49fbdf65 [None][infra] Waive failed case for release branch on 01/19 (#10795)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yukun He
25bdc30162 [https://nvbugs/5782112][fix] Cherry-pick #10633: Fix hanging issue for MNNVL Allreduce under PP (#10750)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yuxian Qiu
2b3bb2e9b0 [https://nvbugs/5811697][fix] Fix buffer reuse. (#10716)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
4b833492fb [None][infra] Waive failed cases for release on 10/18 (#10781)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Faraz
aa410c57bc [TRTLLM-5366][chore] Add dgx-spark beta notes (#10766)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Mike Iovine
f02948d956 [https://nvbugs/5803813][fix] Fix llama 4 min latency (#10724)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Patrice Castonguay
93e7ae73ea [None][doc] 1.2 Release Notes Headers (#10722)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
TensorRT LLM
0c393ebc69 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-25 03:04:42 +00:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler (#10941)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Yuxian Qiu
9fcc93ea7b
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-24 14:04:10 +08:00
Emma Qiao
9d65b8bf24
[None][infra] Fix TRT-LLM data scratch mount point for gb10x (#10880)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 14:00:17 +08:00
Yanchao Lu
78a008d61a
[None][ci] Remove long-running sanity check tests on GH200 (#10924) (#10969)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 13:06:28 +08:00
Kaiyu Xie
da967d0bd7
[TRTLLM-10334] [feat] Support overlap scheduler for disagg ctx instances (#10755)
Signed-off-by: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
2026-01-23 22:29:37 -05:00
TensorRT LLM
58dc4bea9c [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-24 03:03:08 +00:00
jthomson04
cf88da7eca
[None][feat] KV Connector Support for MTP (#10932)
Signed-off-by: jthomson04 <jwillthomson19@gmail.com>
Co-authored-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
2026-01-23 18:58:26 -05:00
Taylor Yeonbok Lee
1fbbb1f3cd
[None][feat] AutoDeploy: Enhance memory consumption for MoE fusion transform (#10772)
Signed-off-by: Taylor Yeonbok Lee <249374542+taylor-yb-lee@users.noreply.github.com>
2026-01-23 15:22:54 -08:00
Jin Li
b560598c79
[https://nvbugs/5707359][fix] Unwaive the test that due to flashinfer… (#10570)
Signed-off-by: Jin Li <59594262+liji-nv@users.noreply.github.com>
2026-01-23 13:09:04 -05:00