Based on code from yakovdyadkin & Scott Moe in MR 349
Adds -S 1 option to suffix each performance report line with
a timestamp. Format is "%Y-%m-%d %H:%M:%S"
This is especially useful when using the -N 0 option and looking
for hangs or failure events.
- add GIN-only A2A kernel implementation
- add hybrid LSA+GIN A2A kernel implementation
- update perf test cases to expose a function for setting
devCommRequirements for each device implementation and
simplify devCommCreate code path to use this directly instead
of complex fallback logic
- add missing call to devCommDestroy
This adds support for writing structured information about the run to a JSON file.
Enable with -J <filename>.json
If the target JSON filename already exists then an incrementing numeric suffix will be
added to create <filename>.json.<n>
Fix compilation failure on ctaPolicy with NCCL <= 2.26.
Fix compilation failure on local_register with NCCL <= 2.18.
Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26
but run with NCCL >= 2.27.
Added Device API infrastructure and example kernels
Two new command line arguments:
-D <num> device kernel implementation to use <0/1/2/3/4>
-V <num> number of CTAs to launch device kernels with
Added new CTA Policy command line option:
-x <policy> set the CTA Policy <0/1/2>
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.
This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:
Collective test starting: sendrecv_perf
And at the end, we will now see this:
Collective test concluded: sendrecv_perf
This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.
For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.
Signed-off-by: Martin Belanger <martin.belanger@dell.com>
Build option DSO=1 generates libverifiable.so which can be
used to reduce the combined binary size.
Build option NAME_SUFFIX can be used to a add suffix to all
generated binaries. e.g. NAME_SUFFIX=_mpi
Added new make target: clean_intermediates
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.
Will be overrided by `NCCL_TESTS_SPLIT_MASK`.
Examples:
NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72.
You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.
The followings are all equivalent:
NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.