TensorRT-LLMs

mirror of https://github.com/NVIDIA/TensorRT-LLM.git synced 2026-01-14 06:27:45 +08:00

History

Simeng Liu bb766eca0a feat: Reduce branch overhead in groupRMSNorm kernels (#4067 ) * feat: Reduce branch overhead in groupRMSNorm kernels * Fix race condition with sm < 90 and avoid all threads in one warp writing to the same shared memory. Signed-off-by: Simeng Liu <simengl@nvidia.com> --------- Signed-off-by: Simeng Liu <simengl@nvidia.com>		2025-05-08 00:55:27 +08:00
..
CMakeLists.txt	feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438 )	2025-05-02 13:25:30 +08:00
groupRmsNormKernels.cu	feat: Reduce branch overhead in groupRMSNorm kernels (#4067 )	2025-05-08 00:55:27 +08:00
groupRmsNormKernels.h	feat: Add group_rms_norm kernel to normalize multiple inputs in a single operator. (#3438 )	2025-05-02 13:25:30 +08:00