mirror of
https://github.com/NVIDIA/nccl-tests.git
synced 2026-04-25 08:58:18 +08:00
Added a precision for AllGather and ReduceScatter sizes since NCCL uses the size per rank.
This commit is contained in:
parent
eb4c43ff3d
commit
dcf818955f
@ -78,6 +78,8 @@ And the Bus Bandwidth is therefore computed as :
|
||||
|
||||
`B = S/t * (n-1)/n = algbw * (n-1)/n`
|
||||
|
||||
Note that here, S is the size in bytes of the total array, which for NCCL is equal to `recvcount*sizeof(datatype)*n` as the `recvcount` argument is the count per rank.
|
||||
|
||||
### AllGather
|
||||
|
||||
The AllGather operation requires only to perform the assignation part of the allReduce operation :
|
||||
@ -94,6 +96,8 @@ And the Bus Bandwidth is therefore computed as :
|
||||
|
||||
`B = S/t * (n-1)/n = algbw * (n-1)/n`
|
||||
|
||||
Note that here, S is the size in bytes of the total array, which for NCCL is equal to `sendcount*sizeof(datatype)*n` as the `sendcount` argument is the count per rank.
|
||||
|
||||
### Broadcast
|
||||
|
||||
The broadcast operation representation is similar to allGather :
|
||||
|
||||
Loading…
Reference in New Issue
Block a user