From dcf818955fa6e279e03263c984e95384164c24ad Mon Sep 17 00:00:00 2001 From: Sylvain Jeaugey Date: Fri, 17 Aug 2018 14:58:44 -0700 Subject: [PATCH] Added a precision for AllGather and ReduceScatter sizes since NCCL uses the size per rank. --- doc/PERFORMANCE.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/doc/PERFORMANCE.md b/doc/PERFORMANCE.md index 97419ec..7cc6ece 100644 --- a/doc/PERFORMANCE.md +++ b/doc/PERFORMANCE.md @@ -78,6 +78,8 @@ And the Bus Bandwidth is therefore computed as : `B = S/t * (n-1)/n = algbw * (n-1)/n` +Note that here, S is the size in bytes of the total array, which for NCCL is equal to `recvcount*sizeof(datatype)*n` as the `recvcount` argument is the count per rank. + ### AllGather The AllGather operation requires only to perform the assignation part of the allReduce operation : @@ -94,6 +96,8 @@ And the Bus Bandwidth is therefore computed as : `B = S/t * (n-1)/n = algbw * (n-1)/n` +Note that here, S is the size in bytes of the total array, which for NCCL is equal to `sendcount*sizeof(datatype)*n` as the `sendcount` argument is the count per rank. + ### Broadcast The broadcast operation representation is similar to allGather :