Add optional testEngine.initCommConfig, invoked from initComms
after the shared ncclConfig_t setup.
sendrecv registers SendRecvInitCommConfig to set maxP2pPeers=2
Signed-off-by: David Addison <daddison@nvidia.com>
Based on the changes in NCCL v2.29.3, update the alltoall test to
either provide a ginConnectionType or set ginForceEnable to true.
Signed-off-by: Ahsan Pervaiz <apervaiz@nvidia.com>
Based on code from yakovdyadkin & Scott Moe in MR 349
Adds -S 1 option to suffix each performance report line with
a timestamp. Format is "%Y-%m-%d %H:%M:%S"
This is especially useful when using the -N 0 option and looking
for hangs or failure events.
- add GIN-only A2A kernel implementation
- add hybrid LSA+GIN A2A kernel implementation
- update perf test cases to expose a function for setting
devCommRequirements for each device implementation and
simplify devCommCreate code path to use this directly instead
of complex fallback logic
- add missing call to devCommDestroy
This adds support for writing structured information about the run to a JSON file.
Enable with -J <filename>.json
If the target JSON filename already exists then an incrementing numeric suffix will be
added to create <filename>.json.<n>
Fix compilation failure on ctaPolicy with NCCL <= 2.26.
Fix compilation failure on local_register with NCCL <= 2.18.
Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26
but run with NCCL >= 2.27.
Added Device API infrastructure and example kernels
Two new command line arguments:
-D <num> device kernel implementation to use <0/1/2/3/4>
-V <num> number of CTAs to launch device kernels with
Added new CTA Policy command line option:
-x <policy> set the CTA Policy <0/1/2>
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.
This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:
Collective test starting: sendrecv_perf
And at the end, we will now see this:
Collective test concluded: sendrecv_perf
This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.
For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.
Signed-off-by: Martin Belanger <martin.belanger@dell.com>