TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 02:31:33 +08:00

Author	SHA1	Message	Date
Chuang Zhu	d6f76d2fae	[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-27 16:34:17 +08:00
ZhichenJiang	fae4985797	[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>	2026-01-27 16:15:32 +08:00
Bo Li	6b251cc7fa	[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-27 15:55:07 +08:00
Lizhi Zhou	93ae8a14ab	[#10889 ][fix] fix pydantic deepcopy bug (#11004 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-27 02:40:13 -05:00
xinhe-nv	069ad30bdb	[None][chore] Remove closed bugs (#10982 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-27 15:35:44 +08:00
Yiqing Yan	ea5d811aec	[None][chore] Bump version to 1.3.0rc2 (#11021 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-27 15:26:03 +08:00
Emma Qiao	c761b68481	[None][infra] Waive failed cases for main on 01/27 (#11017 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-27 15:24:54 +08:00
zhhuang-nv	ca9f70f78c	[https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test (#8683 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2026-01-27 15:22:21 +08:00
Tailing Yuan	5553391c5e	[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-27 13:18:34 +08:00
Wanli Jiang	4a206351bb	[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-27 13:04:40 +08:00
TensorRT LLM	da43a28b01	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-27 03:23:36 +00:00
ameynaik-hub	df8be0c50c	[TRTLLM-10276][feat] Integrate cutedsl argmax kernel (#10476 ) Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com> Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com> Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2026-01-26 22:08:47 -05:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
tcherckez-nvidia	43b8a5561c	[None][chore] update AD model list (#10981 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-26 16:49:50 +02:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Linda	ce556290c9	[None][chore] Removing pybind11 bindings and references (#10550 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-26 08:19:12 -05:00
Pengyun Lin	ce37e27066	[#10614 ][fix] gpt_oss first iteration streaming in trtllm-serve (#10808 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-26 20:53:11 +08:00
Pengbo Wang	5d7a5e6800	[https://nvbugs/5779536 ][fix] Cherry-pick #10855 : Unwaive Llama 3.3 related multi GPU tests (#10942 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-26 05:40:29 -05:00
Bo Li	e405468230	[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-26 17:59:03 +08:00
Tian Zheng	5efee01da1	[None][feat] Add Skip Softmax MLA kernels for Blackwell and Fix an accuracy bug of NVFP4 KV (#10813 ) Signed-off-by: Tian Zheng <29906817+Tom-Zheng@users.noreply.github.com>	2026-01-26 16:46:33 +08:00
Emma Qiao	a3a3ceb17f	[None][infra] Waive failed case for main branch on 01/26 (#10994 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-26 03:20:53 -05:00
xinhe-nv	d3406cb515	[None][chore] Add failed cases into waives.txt (#10976 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-26 02:23:05 -05:00
yingguo-trt	c8f1745a6e	[https://nvbugs/5661741 ][feat] Add 250K-token NVFP4 MoE + PDL regression tests (#10911 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-26 01:48:29 -05:00
xinhe-nv	2d8245d125	[None][chore] Add failed cases into waives.txt (#10974 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-26 00:33:50 -05:00
TensorRT LLM	d2b5954aea	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-26 03:26:18 +00:00
Enwei Zhu	ffab217974	[None][fix] Fix CuteDSL MoE unittest (#10983 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-26 08:34:17 +08:00
Yanchao Lu	45d7022cc3	[None][test] Waive failed tests on main 1/25 (#10984 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-26 00:32:02 +08:00
Enwei Zhu	72ef732bcf	[TRTLLM-10147][perf] Balanced random MoE workload generator for CuteDSL kernel UT, autotuner and layerwise benchmark (#10279 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-25 21:02:30 +08:00
Pengyun Lin	fd7fd8c39d	[https://nvbugs/5747938 ][infra] Unwaive trtllm serve example test (#10820 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
dominicshanshan	c98c286c0f	[https://nvbugs/5814203 ][fix] Fix port 8000 being used issue in stress test. (#10756 ) Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yanchao Lu	ae58a7ed20	[None][chore] Revert NVIDIA/TensorRT-LLM#10819 (#10870 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Ivy Zhang	bcd2dc490c	[None][test] Update case for release (#10811 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yanchao Lu	18f63dfcec	[None][chore] Reduce tedious logs (#10819 ) Signed-off-by: Yanchao Lu <yanchaol@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	44aa6c3b8e	[None][infra] Waive failed cases for release branch on 01/20 (#10828 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
mpikulski	0f7ec033f7	[https://nvbugs/5791242 ][fix] workaround for flashinfer.sampling.sampling_from_logits (#10713 ) Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Patrice Castonguay	8959c41d8b	[https://nvbugs/5748664 ][fix] Increasing disagg acc test timeout (#10764 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Ivy Zhang	4ebc1b1596	[None][test] Update test case for release (#10763 ) Signed-off-by: Ivy Zhang <25222398+crazydemo@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
ruodil	4df0ca8bd1	[None][test] modify ctx config in 128k8k disagg cases (#10779 ) Signed-off-by: Ruodi Lu <ruodil@users.noreply.github.com> Co-authored-by: Ruodi Lu <ruodil@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	af49fbdf65	[None][infra] Waive failed case for release branch on 01/19 (#10795 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yukun He	25bdc30162	[https://nvbugs/5782112 ][fix] Cherry-pick #10633 : Fix hanging issue for MNNVL Allreduce under PP (#10750 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Yuxian Qiu	2b3bb2e9b0	[https://nvbugs/5811697 ][fix] Fix buffer reuse. (#10716 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Emma Qiao	4b833492fb	[None][infra] Waive failed cases for release on 10/18 (#10781 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Faraz	aa410c57bc	[TRTLLM-5366][chore] Add dgx-spark beta notes (#10766 ) Signed-off-by: Faraz Khoubsirat <58580514+farazkh80@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Mike Iovine	f02948d956	[https://nvbugs/5803813 ][fix] Fix llama 4 min latency (#10724 ) Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
Patrice Castonguay	93e7ae73ea	[None][doc] 1.2 Release Notes Headers (#10722 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Signed-off-by: Wangshanshan <30051912+dominicshanshan@users.noreply.github.com>	2026-01-25 18:12:21 +08:00
TensorRT LLM	0c393ebc69	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-25 03:04:42 +00:00
Patrice Castonguay	d548b29a41	[None][fix] Bugfix/mtp with async scheduler (#10941 ) Signed-off-by: Patrice Castonguay <55748270+pcastonguay@users.noreply.github.com> Co-authored-by: rongwei <scutizhang@tencent.com>	2026-01-24 07:19:54 -05:00
Yao Yao	6f07fa81d7	[TRTLLM-7738][feat] Adding implementation of KVCacheManagerV2 (#10736 ) Signed-off-by: Yao Yao <lowsfer@users.noreply.github.com> KVCacheManagerV2 is a new python-based implementation of the KV cache manager, featuring cleaner API, better abstraction and better code quality without the accumulated legacy.	2026-01-24 04:48:39 -05:00
Yuxian Qiu	9fcc93ea7b	[https://nvbugs/5829097 ][fix] Re-init TRTLLM sampler to use sample stream in multi-stream cases. (#10918 ) Signed-off-by: Yuxian Qiu <142763828+yuxianq@users.noreply.github.com>	2026-01-24 14:04:10 +08:00
Emma Qiao	9d65b8bf24	[None][infra] Fix TRT-LLM data scratch mount point for gb10x (#10880 ) Signed-off-by: qqiao <qqiao@nvidia.com> Signed-off-by: Emma Qiao <qqiao@nvidia.com> Co-authored-by: Yanchao Lu <yanchaol@nvidia.com>	2026-01-24 14:00:17 +08:00

1 2 3 4 5 ...

4850 Commits