Update README.md

Improve MPI example to avoid confusion of number of processes / total number of GPUs. https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
2026-01-14 02:47:21 +08:00 · 2023-01-03 08:47:43 +01:00 · 2023-01-03 08:47:43 +01:00 · 2cbb968101
commit 2cbb968101
parent 0b4c4cb99f
1 changed files with 2 additions and 2 deletions
--- a/README.md
+++ b/README.md
@ -29,9 +29,9 @@ Run on 8 GPUs (`-g 8`), scanning from 8 Bytes to 128MBytes :
 $ ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 8
 ```

-Run with MPI on 40 processes (potentially on multiple nodes) with 4 GPUs each :
+Run with MPI on 10 processes (potentially on multiple nodes) with 4 GPUs each, for a total of 40 GPUs:
 ```shell
-$ mpirun -np 40 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4
+$ mpirun -np 10 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 4
 ```

 ### Performance