..
allReduceFusionKernels.cu
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
2026-01-13 17:16:22 +08:00
allReduceFusionKernels.h
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
2025-12-12 23:32:15 +08:00
allReduceWorkspace.cu
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
2026-01-13 17:16:22 +08:00
allReduceWorkspace.h
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
2026-01-13 17:16:22 +08:00
customLowPrecisionAllReduceKernels.cu
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
2025-12-12 23:32:15 +08:00
customLowPrecisionAllReduceKernels.h
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
2025-12-12 23:32:15 +08:00
mnnvlAllreduceKernels.cu
[ https://nvbugs/5729697 ][fix] MNNVL Allreduce: use CUDA runtime instead of Macro to get SM version. ( #10062 )
2025-12-23 16:07:07 +08:00
mnnvlAllreduceKernels.h
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
2025-12-12 23:32:15 +08:00
moeAllReduceFusionKernels.cu
[ https://nvbugs/5788127 ][fix] Use uint64_t as the dtype of lamport_buffer_size to avoid overflow ( #10499 )
2026-01-13 17:16:22 +08:00
moeAllReduceFusionKernels.h
[None][fix] Introduce inline namespace to avoid symbol collision ( #9541 )
2025-12-12 23:32:15 +08:00
moeAlltoAllKernels.cu
[None][fix] Add a timeout in MNNVL throughput to prevent hangs if one rank crashes ( #9532 )
2026-01-21 10:14:39 +08:00
moeAlltoAllKernels.h
[TRTLLM-10126][feat] Increase topk upper limit to 22 for NVLinkOneSid… ( #10229 )
2025-12-27 22:48:10 +08:00