TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-13 22:18:36 +08:00

Author	SHA1	Message	Date
Anish Shanbhag	dacc881993	[https://nvbugs/5761391 ][fix] Use correct model names for config database regression tests (#10192 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2026-01-12 10:55:07 -08:00
chenfeiz0326	54459377d2	[TRTLLM-10248][feat] Support Bot to Send Perf Regression Msg to Slack Channel (#10489 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2026-01-12 14:23:23 +08:00
Zongfei Jing	bb2f883296	[None] [feat] Add test script and raster M for gather fc1 kernel (#10429 ) Signed-off-by: Zongfei Jing <20381269+zongfeijing@users.noreply.github.com>	2026-01-07 09:31:49 +08:00
alel	6b8ae6fa81	[None][feat] CuteDSL MOE FC1 Enhancement (#10088 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2026-01-06 09:30:43 +08:00
Yukun He	d272f1a9bc	[TRTLLM-8821][feat] Apply AutoTuner to AllReduce Op for strategy tuning. (#8531 ) Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>	2026-01-05 15:44:37 +08:00
chenfeiz0326	a23c6f1092	[TRTLLM-9834][feat] Transfer to TRTLLM-INFRA Database and Fail post-merge tests if regression (#10282 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-31 21:44:59 +08:00
chenfeiz0326	d70aeddc7f	[TRTLLM-8952][feat] Support Multi-Node Disagg Perf Test in CI (#9138 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-26 22:50:53 +08:00
ZhichenJiang	46e4af5688	[TRTLLM-9831][perf] Enable 2CTA with autotune for CuteDSL MoE and Grouped GEMM optimizations (#10201 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com> Signed-off-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com> Co-authored-by: Enwei Zhu <21126786+syuoni@users.noreply.github.com>	2025-12-25 09:04:20 -05:00
Anish Shanbhag	7c82605327	[None][fix] enable KV cache reuse for config database (#10094 )	2025-12-19 15:16:56 -08:00
Anish Shanbhag	91a9ae42d2	[TRTC-71][feat] Add regression testing for config database (#9832 ) Signed-off-by: Anish Shanbhag <ashanbhag@nvidia.com>	2025-12-18 16:15:38 -08:00
ZhichenJiang	4e55b83101	[None][perf] Add more optimization options for MOE CuteDSL finalized kernel (#10042 ) Signed-off-by: zhichen jiang <zhichenj@NVIDIA.com>	2025-12-18 22:49:28 +08:00
Lizhi Zhou	bd13957e70	[TRTLLM-9181][feat] improve disagg-server prometheus metrics; synchronize workers' clocks when workers are dynamic (#9726 ) Signed-off-by: Lizhi Zhou <1432185+reasonsolo@users.noreply.github.com>	2025-12-16 05:16:32 -08:00
tburt-nv	6147452158	[https://nvbugs/4141427 ][chore] Add more details to LICENSE file (#9881 ) Signed-off-by: Tyler Burt <195370667+tburt-nv@users.noreply.github.com>	2025-12-13 08:35:31 +08:00
chenfeiz0326	61745f034a	[https://nvbugs/5727481 ][ci] Fix Port Conflict in Perf-Sanity CI Test (#9896 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-12 17:16:50 +08:00
chenfeiz0326	383178c00a	[TRTLLM-9000][feat] Add multi-node Perf Tests into CI (#8800 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-12-08 09:00:44 +08:00
Ludwig Schneider	41ce14ab04	[None][feat] Enable NCCL_SYMMETRIC as default fallback for AllReduce (#9314 ) Signed-off-by: Ludwig Schneider <lschneider@nvidia.com>	2025-12-07 09:43:26 -08:00
alel	4107254c82	[TRTLLM-6222][feat] Several perf opt for cuteDSL nvf4 gemm (#9428 ) Signed-off-by: Yuhan Li <51736452+liyuhannnnn@users.noreply.github.com>	2025-12-01 18:10:45 +08:00
Liao Lanyu	bf84d9cea1	[None][chore] add spec_decoding configs in perf benchmark scripts and fix typos (#9533 ) Signed-off-by: Lanyu Liao <lancelly@users.noreply.github.com> Co-authored-by: Lanyu Liao <lancelly@users.noreply.github.com>	2025-11-28 14:52:05 +08:00
Yukun He	2225745782	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com> Signed-off-by: Mike Iovine <6158008+mikeiovine@users.noreply.github.com>	2025-11-04 16:42:31 +08:00
chenfeiz0326	cc4ab8d9d1	[TRTLLM-8825][feat] Support Pytest Perf Results uploading to Database (#8653 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-11-03 16:23:13 +08:00
chenfeiz0326	6cf1c3fba4	[TRTLLM-8260][feat] Add Server-Client Perf Test in pytest for B200 and B300 (#7985 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com>	2025-10-22 10:17:22 +08:00
Yi Zhang	3c2b3bd4d4	[TRTLLM-7255][feat] Add iteration log parser script for benchmark log (#6942 ) Signed-off-by: Yi Zhang <187001205+yizhang-nv@users.noreply.github.com>	2025-10-20 01:34:52 -04:00
Daniel Cámpora	53312eeebd	[TRTLLM-7157][feat] BREAKING CHANGE Introduce sampler_type, detect sampler according to options (#6831 ) Signed-off-by: Daniel Campora <961215+dcampora@users.noreply.github.com>	2025-08-16 00:27:24 -04:00
chenfeiz0326	5cd8c0f6cc	[None][test] Add perf-sweep scripts (#6738 ) Signed-off-by: Chenfei Zhang <chenfeiz@nvidia.com> Signed-off-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com> Co-authored-by: Perkz Zheng <67892460+PerkzZheng@users.noreply.github.com>	2025-08-14 14:04:47 +08:00

24 Commits