TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Yukun He fd4311e6a3 [TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 ) Because we have encountered some perf regression due to using a one-shot kernel instead of NCCL on A100/H100, it will be beneficial if we can have a solid benchmarking of allreduce Op and analyze the data collected from it. Implemented new AllreduceOp heuristic: - Added Linear programming-based heuristic implementation. - Added LUT-based heuristic implementation and corresponding code generation script. AllreduceOp minor fixing: - Fixed a minor issue in AllreduceOp, that the strategy can not be overridden when ONESHOT or TWOSHOT is set. - Fixed a minor TWOSHOT kernel perf issue. - Cleaned up Dispatching code in AllReduceOp. This PR will fix the perf gaps reported in: https://nvbugspro.nvidia.com/bug/5517023 For Deepseek-R1, it shows a performance gain of about 3-4% in concurrency levels of 256 and 512. Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>		2025-10-16 14:15:25 +08:00
..
allReduceFusionKernels.cu	[TRTLLM-8129][feat] Allreduce tuning and benchmark script revising (#7870 )	2025-10-16 14:15:25 +08:00
allReduceFusionKernels.h	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
allReduceWorkspace.cu	chore: bump version to 0.19.0 (#3598 ) (#3841 )	2025-04-29 16:57:22 +08:00
allReduceWorkspace.h	feat: fix and improve allreduce and fusion kernels (#3064 )	2025-04-08 19:33:52 +08:00
customLowPrecisionAllReduceKernels.cu	feat: Low Precision Allreduce for PCIe based GPU (#4344 )	2025-05-20 06:53:46 +08:00
customLowPrecisionAllReduceKernels.h	feat: Low Precision Allreduce for PCIe based GPU (#4344 )	2025-05-20 06:53:46 +08:00
mnnvlTwoShotAllreduceKernels.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
mnnvlTwoShotAllreduceKernels.h	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
moeAllReduceFusionKernels.cu	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00
moeAllReduceFusionKernels.h	[None] [feat] Add model gpt-oss (#6645 )	2025-08-07 03:04:18 -04:00