TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-02-05 02:31:33 +08:00

Author	SHA1	Message	Date
Stefan Niebler	7d31532850	[TRTLLM-10312][perf] Improve performance of _write_finish_reasons in TorchSampler (#10459 ) Signed-off-by: Stefan Niebler <82932102+stnie@users.noreply.github.com>	2026-01-29 11:06:09 -05:00
WeiHaocheng	80dd6e70c6	[TRTLLM-10415][feat] Dump thread stacks for hanging tests before time… (#10708 ) Signed-off-by: Fred Wei <20514172+WeiHaocheng@users.noreply.github.com>	2026-01-29 20:43:34 +08:00
Balaram Buddharaju	c7a86f89de	[TRTLLM-10264][feat] Support attention DP + Helix CP (#10477 ) Signed-off-by: Balaram Buddharaju <169953907+brb-nv@users.noreply.github.com>	2026-01-29 02:57:13 -05:00
Zhanrui Sun	21d475a391	[None][infra] Waived flaky tests (#11091 ) Signed-off-by: ZhanruiSunCh <184402041+ZhanruiSunCh@users.noreply.github.com>	2026-01-29 02:18:30 -05:00
Yi Sun	f6dab8388d	[https://nvbugs/5813452 ][fix] Fix "Assertion failed: isLeaf() in kvCacheManager.cpp:465" (#10922 ) Signed-off-by: Yi Sun <yisun0618@gmail.com>	2026-01-29 14:38:11 +08:00
Tailing Yuan	91528365a9	[None][feat] Add performance alignment to layer-wise benchmarks (#11018 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-29 14:01:51 +08:00
Enwei Zhu	34a730aaf7	[None][fix] Fix enable_alltoall passed to CutlassFusedMoE (#11016 ) Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2026-01-29 12:11:07 +08:00
Anish Shanbhag	24ac86c485	[https://nvbugs/5761391 ][fix] Include triton-kernels as a packaged dependency (#10471 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-28 19:56:32 -08:00
TensorRT LLM	e20f9a9c72	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-29 03:20:49 +00:00
Yiqing Yan	6fcbf15fb8	[None][fix] No need to remove the original waive list (#11060 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-29 11:10:38 +08:00
Frida Hou	f03908cf9e	[None][fix] fix Qwen2/3 export for AutoDeploy (#11007 ) Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>	2026-01-28 16:53:21 -08:00
Ludwig Schneider	4e10bf8950	[None][fix] nccl symmetric with graceful fallbacks (#11042 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2026-01-28 15:43:24 -08:00
Bala Marimuthu	393c3d259e	[#10245 ][feat] AutoDeploy: Add Minimax M2 support (#10525 ) Signed-off-by: Balamurugan Marimuthu <246387390+bmarimuthu-nv@users.noreply.github.com>	2026-01-28 17:22:32 -05:00
gramnarayan	744a955cbb	[None][chore] AutoDeploy: Eagle One-Model [1/n]: PyTorch impl for Eagle3 Llama checkpoint (#10674 ) Signed-off-by: Govind Ramnarayan <105831528+govind-ramnarayan@users.noreply.github.com>	2026-01-28 12:10:49 -08:00
Emma Qiao	0ffa77af51	[None][infra] Waive failed cases for main on 1/28 (#11053 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-28 06:11:06 -05:00
yingguo-trt	e70a55bd94	[None][feat] support multi_acc and Lyris GB200 test (#11024 ) Signed-off-by: yingguo-trt <244492186+yingguo-trt@users.noreply.github.com>	2026-01-28 06:01:48 -05:00
Linda	29647d9446	[None][chore] Removing cpp/tensorrt_llm/pybind (#11026 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-28 11:25:11 +01:00
Grzegorz Kwasniewski	38bcee189c	[TRTLLM-10362][feat] Added Mamba and MLA layers to the sharding tests (#10364 ) Signed-off-by: greg-kwasniewski1 <213329731+greg-kwasniewski1@users.noreply.github.com> Signed-off-by: Grzegorz Kwasniewski <213329731+greg-kwasniewski1@users.noreply.github.com>	2026-01-28 10:34:10 +01:00
yuanjingx87	3e17ee4e38	[None][infra] Update CI allowList (#11040 ) Signed-off-by: Yuanjing Xue <197832395+yuanjingx87@users.noreply.github.com>	2026-01-28 00:54:37 -08:00
Pengbo Wang	d008494232	[https://nvbugs/5779536 ][fix] Cherry-pick #10902 : Unwaive DeepSeekR1 nvfp4 pp4 mtp test case (#10902 ) (#11000 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-28 14:18:53 +08:00
xinhe-nv	dc5eda546b	[None][fix] unwaive tests (#11047 ) Signed-off-by: Xin He (SW-GPU) <200704525+xinhe-nv@users.noreply.github.com>	2026-01-27 23:49:07 -05:00
TensorRT LLM	a7748ceb57	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-28 03:36:53 +00:00
dongfengy	1c2e415b3a	[https://nvbugs/5756804 ][fix] Re-enable passing test (#10986 ) Signed-off-by: Dongfeng Yu <dongfengy@nvidia.com> Signed-off-by: dongfengy <99041270+dongfengy@users.noreply.github.com>	2026-01-28 11:23:43 +08:00
Yuan Tong	30348b2753	[None][fix] Proper conditional compilation of sm10x cubins (#10839 ) Signed-off-by: Yuan Tong <13075180+tongyuantongyu@users.noreply.github.com>	2026-01-28 10:17:51 +08:00
Matt Lefebvre	c26a8f764c	[TRTINFRA-7379][infra] Change SLURM config access to use resolvePlatform (#11006 ) Signed-off-by: Matt Lefebvre <mlefebvre@nvidia.com>	2026-01-27 12:33:16 -08:00
NVShreyas	6c1862fb33	[TRTLLM-10197][chore] Refactor to setup for RNN cache transceiver (#10957 ) Signed-off-by: Shreyas Misra <shreyasm@nvidia.com>	2026-01-27 12:23:02 -08:00
Evgueni Petrov	f25a2c53bb	[#10877 ][fix] restore ipv6 support in serve.py (#10929 ) Signed-off-by: Evgueni Petrov <evgueni.s.petrov@gmail.com>	2026-01-27 11:55:59 -08:00
Simeng Liu	bae2fac834	[https://nvbugs/5721661 ][chore] Unwaive fixed bug. (#11009 ) Signed-off-by: SimengLiu-nv <simengl@nvidia.com>	2026-01-27 11:41:48 -08:00
Lucas Liebenwein	ff3a494f5c	[#10013 ][feat] AutoDeploy: native cache manager integration (#10635 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-27 11:23:22 -05:00
Gal Hubara-Agam	7f8c260601	[https://nvbugs/5843316 ][chore] waive overlap_scheduler test (#11025 ) Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>	2026-01-27 09:07:52 -05:00
xinhe-nv	552aa32aa2	[None][chore] Add failed cases into waives.txt (#10993 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com> Signed-off-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com> Co-authored-by: Jie Li <76780849+jieli-matrix@users.noreply.github.com>	2026-01-27 06:08:11 -05:00
Yukun He	b575184fca	[TRTLLM-10308][feat] AutoTuner Cache: reorganize cache file for distributed tuning (#10956 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-27 16:39:40 +08:00
Chuang Zhu	d6f76d2fae	[TRTLLM-9527][feat] change context params and disagg params (step3) (#10495 ) Signed-off-by: Chuang Zhu <111838961+chuangz0@users.noreply.github.com>	2026-01-27 16:34:17 +08:00
ZhichenJiang	fae4985797	[TRTLLM-9831][perf] Use TMA.RED to improve effective memory bandwidth (#10987 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>	2026-01-27 16:15:32 +08:00
Bo Li	6b251cc7fa	[TRTLLM-9390][chore] Add Fake OPs for One-Sided AlltoAll. (#11002 ) Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>	2026-01-27 15:55:07 +08:00
Lizhi Zhou	93ae8a14ab	[#10889 ][fix] fix pydantic deepcopy bug (#11004 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2026-01-27 02:40:13 -05:00
xinhe-nv	069ad30bdb	[None][chore] Remove closed bugs (#10982 ) Signed-off-by: xinhe-nv <200704525+xinhe-nv@users.noreply.github.com>	2026-01-27 15:35:44 +08:00
Yiqing Yan	ea5d811aec	[None][chore] Bump version to 1.3.0rc2 (#11021 ) Signed-off-by: Yiqing Yan <yiqingy@nvidia.com>	2026-01-27 15:26:03 +08:00
Emma Qiao	c761b68481	[None][infra] Waive failed cases for main on 01/27 (#11017 ) Signed-off-by: qqiao <qqiao@nvidia.com>	2026-01-27 15:24:54 +08:00
zhhuang-nv	ca9f70f78c	[https://nvbugs/5612438 ][fix] Add timeout for SeedOSS test (#8683 ) Signed-off-by: Zhen Huang <145532724+zhhuang-nv@users.noreply.github.com>	2026-01-27 15:22:21 +08:00
Tailing Yuan	5553391c5e	[TRTLLM-10560][fix] Fix the time of pause() for overlap scheduler (#10943 ) Signed-off-by: Tailing Yuan <yuantailing@gmail.com>	2026-01-27 13:18:34 +08:00
Wanli Jiang	4a206351bb	[TRTLLM-10453][feat] Update mamba decode kernel to flashinfer (#10757 ) Signed-off-by: Wanli Jiang <35160485+Wanli-Jiang@users.noreply.github.com>	2026-01-27 13:04:40 +08:00
TensorRT LLM	da43a28b01	[None][infra] Check in most recent lock file from nightly pipeline Signed-off-by: TensorRT LLM <90828364+tensorrt-cicd@users.noreply.github.com>	2026-01-27 03:23:36 +00:00
ameynaik-hub	df8be0c50c	[TRTLLM-10276][feat] Integrate cutedsl argmax kernel (#10476 ) Signed-off-by: Amey Naik <212485788+ameynaik-hub@users.noreply.github.com> Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com> Co-authored-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2026-01-26 22:08:47 -05:00
sunnyqgg	ff0dd6076e	[TRTLLM-10062][feat] Enable MTP for Nemotron Super (#10754 ) Signed-off-by: qgai <qgai@nvidia.com>	2026-01-26 11:23:26 -05:00
tcherckez-nvidia	43b8a5561c	[None][chore] update AD model list (#10981 ) Signed-off-by: Tal Cherckez <127761168+tcherckez-nvidia@users.noreply.github.com>	2026-01-26 16:49:50 +02:00
Lucas Liebenwein	00f341be49	[#8982 ][feat] AutoDeploy attention dp support (#10728 ) Signed-off-by: Lucas Liebenwein <11156568+lucaslie@users.noreply.github.com>	2026-01-26 09:43:33 -05:00
Linda	ce556290c9	[None][chore] Removing pybind11 bindings and references (#10550 ) Signed-off-by: Linda-Stadter <57756729+Linda-Stadter@users.noreply.github.com>	2026-01-26 08:19:12 -05:00
Pengyun Lin	ce37e27066	[#10614 ][fix] gpt_oss first iteration streaming in trtllm-serve (#10808 ) Signed-off-by: Pengyun Lin <81065165+LinPoly@users.noreply.github.com>	2026-01-26 20:53:11 +08:00
Pengbo Wang	5d7a5e6800	[https://nvbugs/5779536 ][fix] Cherry-pick #10855 : Unwaive Llama 3.3 related multi GPU tests (#10942 ) Signed-off-by: Pengbo Wang <221450789+pengbowang-nv@users.noreply.github.com>	2026-01-26 05:40:29 -05:00

1 2 3 4 5 ...

4882 Commits