TensorRT-LLMs/cpp/tensorrt_llm/kernels/communicationKernels
Bo Li e405468230
[TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885)
Signed-off-by: Bo Li <22713281+bobboli@users.noreply.github.com>
2026-01-26 17:59:03 +08:00
..
allReduceFusionKernels.cu [https://nvbugs/5636916][fix] Cherry-pick #10654: Fix accuracy issue of TWO-SHOT AllReduce kernel (#10841) 2026-01-21 17:25:40 +08:00
allReduceFusionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
allReduceWorkspace.cu [https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499) 2026-01-13 17:16:22 +08:00
allReduceWorkspace.h [https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499) 2026-01-13 17:16:22 +08:00
customLowPrecisionAllReduceKernels.cu [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
customLowPrecisionAllReduceKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
mnnvlAllreduceKernels.cu [https://nvbugs/5729697][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. (#10062) 2025-12-23 16:07:07 +08:00
mnnvlAllreduceKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAllReduceFusionKernels.cu [https://nvbugs/5788127][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow (#10499) 2026-01-13 17:16:22 +08:00
moeAllReduceFusionKernels.h [None][fix] Introduce inline namespace to avoid symbol collision (#9541) 2025-12-12 23:32:15 +08:00
moeAlltoAllKernels.cu [TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885) 2026-01-26 17:59:03 +08:00
moeAlltoAllKernels.h [TRTLLM-10048][feat] Fuse the AllGather for expert statistics required by the EPLB. (#10885) 2026-01-26 17:59:03 +08:00