mirror of
https://github.com/NVIDIA/nccl-tests.git
synced 2026-01-14 02:47:21 +08:00
Add NCCL_TESTS_SPLIT documentation in the README
This commit is contained in:
parent
a89cf07fe8
commit
903918fc54
17
README.md
17
README.md
@ -71,6 +71,23 @@ All tests support the same set of arguments :
|
||||
* `-R,--local_register <1/0>` enable local buffer registration on send/recv buffers. Default : 0.
|
||||
* `-T,--timeout <time in seconds>` timeout each test after specified number of seconds. Default : disabled.
|
||||
|
||||
### Running multiple operations in parallel
|
||||
|
||||
NCCL tests allow to partition the set of GPUs into smaller sets, each executing the same operation in parallel.
|
||||
To split the GPUs, NCCL will compute a "color" for each rank, based on the `NCCL_TESTS_SPLIT` environment variable, then all ranks
|
||||
with the same color will end up in the same group. The resulting group is printed next to each GPU at the beginning of the test.
|
||||
|
||||
`NCCL_TESTS_SPLIT` takes the following syntax: `<operation><value>`. Operation can be `AND`, `OR`, `MOD` or `DIV`. The `&`, `|`, `%`, and `/` symbols are also supported. The value can be either decimal, hexadecimal (prefixed by `0x`) or binary (prefixed by `0b`).
|
||||
|
||||
`NCCL_TESTS_SPLIT_MASK="<value>"` is equivalent to `NCCL_TESTS_SPLIT="&<value>"`.
|
||||
|
||||
Here are a few examples:
|
||||
- `NCCL_TESTS_SPLIT="AND 0x7"` or `NCCL_TESTS_SPLIT="MOD 8`: On systems with 8 GPUs, run 8 parallel operations, each with 1 GPU per node (purely communicating on the network)
|
||||
- `NCCL_TESTS_SPLIT="OR 0x7"` or `NCCL_TESTS_SPLIT="DIV 8"`: On systems with 8 GPUs, run one operation per node, purely intra-node.
|
||||
- `NCCL_TESTS_SPLIT="AND 0x1"` or `NCCL_TESTS_SPLIT="MOD 2"`: Run two operations, each operation using every other rank.
|
||||
|
||||
Note that the reported bandwidth is per group, hence to get the total bandwidth used by all groups, one must multiply by the number of groups.
|
||||
|
||||
## Copyright
|
||||
|
||||
NCCL tests are provided under the BSD license. All source code and accompanying documentation is copyright (c) 2016-2024, NVIDIA CORPORATION. All rights reserved.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user