Commit Graph

4850 Commits

Author SHA1 Message Date
Chuang Zhu
d6f76d2fae
[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495)
Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>
2026-01-27 16:34:17 +08:00
ZhichenJiang
fae4985797
[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987)
Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>
2026-01-27 16:15:32 +08:00
Bo Li
6b251cc7fa
[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-27 15:55:07 +08:00
Lizhi Zhou
93ae8a14ab
[#10889][fix] fix pydantic deepcopy bug (#11004)
Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>
2026-01-27 02:40:13 -05:00
xinhe-nv
069ad30bdb
[None][chore] Remove closed bugs (#10982)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-27 15:35:44 +08:00
Yiqing Yan
ea5d811aec
[None][chore] Bump version to 1.3.0rc2 (#11021)
Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>
2026-01-27 15:26:03 +08:00
Emma Qiao
c761b68481
[None][infra] Waive failed cases for main on 01/27 (#11017)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-27 15:24:54 +08:00
zhhuang-nv
ca9f70f78c
[https://nvbugs/5612438][fix] Add timeout for SeedOSS test (#8683)
Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>
2026-01-27 15:22:21 +08:00
Tailing Yuan
5553391c5e
[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943)
Signed-off-by: Tailing Yuan <yuantailing@gmail.com>
2026-01-27 13:18:34 +08:00
Wanli Jiang
4a206351bb
[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757)
Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>
2026-01-27 13:04:40 +08:00
TensorRT LLM
da43a28b01 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-27 03:23:36 +00:00
ameynaik-hub
df8be0c50c
[TRTLLM-10276][feat] Integrate cutedsl argmax kernel (#10476)
Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com>
Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>
2026-01-26 22:08:47 -05:00
sunnyqgg
ff0dd6076e
[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754)
Signed-off-by: qgai <qgai@nvidia.com>
2026-01-26 11:23:26 -05:00
tcherckez-nvidia
43b8a5561c
[None][chore] update AD model list (#10981)
Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>
2026-01-26 16:49:50 +02:00
Lucas Liebenwein
00f341be49
[#8982][feat] AutoDeploy attention dp support (#10728)
Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>
2026-01-26 09:43:33 -05:00
Linda
ce556290c9
[None][chore] Removing pybind11 bindings and references (#10550)
Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>
2026-01-26 08:19:12 -05:00
Pengyun Lin
ce37e27066
[#10614][fix] gpt_oss first iteration streaming in trtllm-serve (#10808)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
2026-01-26 20:53:11 +08:00
Pengbo Wang
5d7a5e6800
[https://nvbugs/5779536][fix] Cherry-pick #10855: Unwaive Llama 3.3 related multi GPU tests (#10942)
Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>
2026-01-26 05:40:29 -05:00
Bo Li
e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
Tian Zheng
5efee01da1
[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813)
Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>
2026-01-26 16:46:33 +08:00
Emma Qiao
a3a3ceb17f
[None][infra] Waive failed case for main branch on 01/26 (#10994)
Signed-off-by: qqiao <qqiao@nvidia.com>
2026-01-26 03:20:53 -05:00
xinhe-nv
d3406cb515
[None][chore] Add failed cases into waives.txt (#10976)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-26 02:23:05 -05:00
yingguo-trt
c8f1745a6e
[https://nvbugs/5661741][feat] Add 250K-token NVFP4 MoE + PDL regression tests (#10911)
Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>
2026-01-26 01:48:29 -05:00
xinhe-nv
2d8245d125
[None][chore] Add failed cases into waives.txt (#10974)
Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>
2026-01-26 00:33:50 -05:00
TensorRT LLM
d2b5954aea [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-26 03:26:18 +00:00
Enwei Zhu
ffab217974
[None][fix] Fix CuteDSL MoE unittest (#10983)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-26 08:34:17 +08:00
Yanchao Lu
45d7022cc3
[None][test] Waive failed tests on main 1/25 (#10984)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-26 00:32:02 +08:00
Enwei Zhu
72ef732bcf
[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279)
Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>
2026-01-25 21:02:30 +08:00
Pengyun Lin
fd7fd8c39d [https://nvbugs/5747938][infra] Unwaive trtllm serve example test (#10820)
Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
dominicshanshan
c98c286c0f [https://nvbugs/5814203][fix] Fix port 8000 being used issue in stress test. (#10756)
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yanchao Lu
ae58a7ed20 [None][chore] Revert NVIDIA/TensorRT-LLM#10819 (#10870)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
bcd2dc490c [None][test] Update case for release (#10811)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yanchao Lu
18f63dfcec [None][chore] Reduce tedious logs (#10819)
Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
44aa6c3b8e [None][infra] Waive failed cases for release branch on 01/20 (#10828)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
mpikulski
0f7ec033f7 [https://nvbugs/5791242][fix] workaround for flashinfer.sampling.sampling_from_logits (#10713)
Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Patrice Castonguay
8959c41d8b [https://nvbugs/5748664][fix] Increasing disagg acc test timeout (#10764)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Ivy Zhang
4ebc1b1596 [None][test] Update test case for release (#10763)
Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
ruodil
4df0ca8bd1 [None][test] modify ctx config in 128k8k disagg cases (#10779)
Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com>
Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
af49fbdf65 [None][infra] Waive failed case for release branch on 01/19 (#10795)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yukun He
25bdc30162 [https://nvbugs/5782112][fix] Cherry-pick #10633: Fix hanging issue for MNNVL Allreduce under PP (#10750)
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Yuxian Qiu
2b3bb2e9b0 [https://nvbugs/5811697][fix] Fix buffer reuse. (#10716)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Emma Qiao
4b833492fb [None][infra] Waive failed cases for release on 10/18 (#10781)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Faraz
aa410c57bc [TRTLLM-5366][chore] Add dgx-spark beta notes (#10766)
Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Mike Iovine
f02948d956 [https://nvbugs/5803813][fix] Fix llama 4 min latency (#10724)
Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
Patrice Castonguay
93e7ae73ea [None][doc] 1.2 Release Notes Headers (#10722)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>
2026-01-25 18:12:21 +08:00
TensorRT LLM
0c393ebc69 [None][infra] Check in most recent lock file from nightly pipeline
Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>
2026-01-25 03:04:42 +00:00
Patrice Castonguay
d548b29a41
[None][fix] Bugfix/mtp with async scheduler (#10941)
Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com>
Co-authored-by: rongwei <scutizhang@tencent.com>
2026-01-24 07:19:54 -05:00
Yao Yao
6f07fa81d7
[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736)
Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com>

KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.
2026-01-24 04:48:39 -05:00
Yuxian Qiu
9fcc93ea7b
[https://nvbugs/5829097][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918)
Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>
2026-01-24 14:04:10 +08:00
Emma Qiao
9d65b8bf24
[None][infra] Fix TRT-LLM data scratch mount point for gb10x (#10880)
Signed-off-by: qqiao <qqiao@nvidia.com>
Signed-off-by: Emma Qiao <qqiao@nvidia.com>
Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>
2026-01-24 14:00:17 +08:00