diff --git a/README.md b/README.md index e5ac603..b9185a7 100644 --- a/README.md +++ b/README.md @@ -32,13 +32,14 @@ NCCL tests can run on multiple processes, multiple threads, and multiple CUDA de ### Quick examples -Run on single node with 8 GPUs (`-g 8`), scanning from 8 Bytes to 128MBytes : +Run on single node with 8 GPUs (`-g 8`), scanning from 8 Bytes to 128MiB (Mebibytes), doubling between each test (`-f 2`) : ```shell $ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8 ``` -Run 64 MPI processes on nodes with 8 GPUs each, for a total of 64 GPUs spread across 8 nodes : +Run 64 MPI processes on nodes with 8 GPUs each, for a total of 64 GPUs spread across 8 nodes. +Scanning from 8 Bytes to 32GiB (Gibibytes), doubling between each test (`-f 2`). (NB: The nccl-tests binaries must be compiled with `MPI=1` for this case) ```shell @@ -57,10 +58,10 @@ All tests support the same set of arguments : * `-t,--nthreads ` number of threads per process. Default : 1. * `-g,--ngpus ` number of gpus per thread. Default : 1. * Sizes to scan - * `-b,--minbytes ` minimum size to start with. Default : 32M. - * `-e,--maxbytes ` maximum size to end at. Default : 32M. - * Increments can be either fixed or a multiplication factor. Only one of those should be used - * `-i,--stepbytes ` fixed increment between sizes. Default : 1M. + * `-b,--minbytes ` minimum size to start with. Default : 32M (Mebibytes). + * `-e,--maxbytes ` maximum size to end at. Default : 32M (Mebibytes). + * Increments can be either fixed or a multiplication factor. Only one of those should be used. + * `-i,--stepbytes ` fixed increment between sizes. Default : 1M (Mebibytes). * `-f,--stepfactor ` multiplication factor between sizes. Default : disabled. * NCCL operations arguments * `-o,--op ` Specify which reduction operation to perform. Only relevant for reduction operations like Allreduce, Reduce or ReduceScatter. Default : Sum. diff --git a/src/common.cu b/src/common.cu index 800e0ff..f7a3f28 100644 --- a/src/common.cu +++ b/src/common.cu @@ -210,6 +210,7 @@ testResult_t initComms(ncclComm_t* comms, int nComms, int firstRank, int nRanks, return testSuccess; } +// NOTE: We use the binary system, so M=Mebibytes and G=Gibibytes static double parsesize(const char *value) { long long int units; double size;