Commit Graph

141 Commits

Author SHA1 Message Date
David Addison
af1dcac92a NCCL_TESTS_VERSION 2.18.2 2026-03-11 15:35:13 -07:00
David Addison
eb0d3d2a00 Display unalign setting in output 2026-03-11 15:05:54 -07:00
David Addison
e02c20b898 NCCL_TESTS_VERSION 2.18.1 2026-03-11 09:55:31 -07:00
David Addison
c1af7df1f3 Update -z option description in README.md 2026-03-11 09:54:53 -07:00
David Addison
ba52a70492 Allow blocking collectives without MPI_Barrier in timing loop 2026-03-11 09:36:53 -07:00
Theofilos Ioannis Manitaras
8d26b23319 Allocate buffers during thread initialization
Signed-off-by: Theofilos Ioannis Manitaras <tmanitaras@nvidia.com>
2026-03-11 09:36:38 -07:00
David Addison
dd0bafd178 NCCL_TESTS_VERSION 2.18.0 2026-03-06 17:55:12 -08:00
David Addison
115fb09377 Add new unalign flag to README.md and update help text 2026-03-06 17:53:29 -08:00
David Addison
e986a6156c Add -u <index> to force unaligned buffer addresses 2026-03-06 17:39:25 -08:00
David Addison
c379e19a71 NCCL_TESTS_VERSION 2.17.10 2026-03-05 15:35:13 -08:00
Ahsan Pervaiz
db221defdb Request GIN to be explicitly enabled in all to all test
Based on the changes in NCCL v2.29.3, update the alltoall test to
either provide a ginConnectionType or set ginForceEnable to true.

Signed-off-by: Ahsan Pervaiz <apervaiz@nvidia.com>
2026-03-05 15:34:19 -08:00
Marcin Malagowski
ae98985f55 Fix Clang compilation errors with VLA initialization
Signed-off-by: David Addison <daddison@nvidia.com>
2026-02-09 10:38:44 -08:00
David Addison
9938d5a657 Fix compilation issues with latest NCCL release headers
Add --extended-lambda to NVCUFLAGS
2026-02-04 16:43:20 -08:00
David Addison
2535da805b NCCL_TESTS_VERSION 2.17.9 2026-02-03 11:04:48 -08:00
mykeduong
85ca91d1b1 Fix: corrected typos in the JSON output
Signed-off-by: David Addison <daddison@nvidia.com>
2026-02-03 11:03:35 -08:00
David Addison
88d7e33207 Add -M memory report option to README.md 2026-01-15 13:32:55 -08:00
David Addison
81463c58d0 NCCL_TESTS_VERSION 2.17.8 2026-01-06 15:00:17 -08:00
David Addison
7278698c1b Clarified use of Mebibytes and Gibibytes for sizes 2026-01-06 14:59:17 -08:00
Katie Gioioso
2656c58421 NCCL_TESTS_VERSION 2.17.7 2025-12-30 20:18:25 +00:00
Katie Gioioso
070d17528c refactor comm init 2025-12-30 20:18:25 +00:00
Katie Gioioso
332e61896f device api 2.28 is not compatible with 2.29. Check versions and print error if there is a mismatch 2025-12-30 20:18:25 +00:00
Katie Gioioso
24874bdaa8 Compatibility with 2.29 device API: use NCCL_DEV_COMM_REQUIREMENTS_INTIIALIZER, query properties to check for device api support 2025-12-30 20:18:24 +00:00
David Addison
7106245178 Add include of <limits> due to compilation error 2025-12-30 20:13:13 +00:00
David Addison
760c467f12 Add memory usage report option
Use -M 1 to dump library memory usage information
2025-12-30 20:12:58 +00:00
David Addison
4bc314aa27 Add README.md text for -J option 2025-11-21 11:31:48 -08:00
David Addison
51f2e7ed7c Remove trailing WS when timestamp option not used 2025-11-03 11:23:52 -08:00
David Addison
da0b547b1b NCCL_TESTS_VERSION 2.17.6 2025-10-28 10:22:08 -07:00
David Addison
e2af90af76 Add new report_timestamps option to README.md 2025-10-28 10:21:58 -07:00
David Addison
a62c975681 Add option to suffix a timestamp to each perf line
Based on code from yakovdyadkin & Scott Moe in MR 349

Adds -S 1 option to suffix each performance report line with
a timestamp. Format is "%Y-%m-%d %H:%M:%S"

This is especially useful when using the -N 0 option and looking
for hangs or failure events.
2025-10-28 10:11:23 -07:00
David Addison
0bb567cc02 NCCL_TESTS_VERSION 2.17.5 2025-10-28 09:34:56 -07:00
Shane Snyder
013c49e930 add necessary ifdef guards for device API tests 2025-10-28 09:34:26 -07:00
Shane Snyder
f66d20e360 add runtime guards for ncclAlltoAll() 2025-10-28 09:32:17 -07:00
David Addison
3744121a2d NCCL_TESTS_VERSION 2.17.4 2025-10-24 17:11:08 -07:00
David Addison
9641693e9b Add PRINT of nccl-tests, NCCL header and library versions 2025-10-24 17:10:57 -07:00
Shane Snyder
9829ea42b5 add GIN-based device API kernels to alltoall
- add GIN-only A2A kernel implementation
- add hybrid LSA+GIN A2A kernel implementation
- update perf test cases to expose a function for setting
  devCommRequirements for each device implementation and
  simplify devCommCreate code path to use this directly instead
  of complex fallback logic
- add missing call to devCommDestroy
2025-10-24 17:10:34 -07:00
David Addison
00f52811b8 Add support for JSON output to perf test framework
This adds support for writing structured information about the run to a JSON file.

Enable with -J <filename>.json

If the target JSON filename already exists then an incrementing numeric suffix will be
added to create <filename>.json.<n>
2025-10-17 12:01:25 -07:00
Stephen Sachs
abc46770a9 Check if sufficient GPUs are available
The CUDA error message "Test CUDA failure util.cu:706 'invalid device ordinal'"
is not as helpful. Test this explicitly and guide the user.
2025-10-02 15:48:13 -07:00
Sylvain Jeaugey
9a5c15461a Fix compilation for old NCCL versions
Fix compilation failure on ctaPolicy with NCCL <= 2.26.
Fix compilation failure on local_register with NCCL <= 2.18.
Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26
but run with NCCL >= 2.27.
2025-09-05 09:15:06 -07:00
David Addison
e12dbb0a14 Update to align with the NCCL 2.28 release
Added Device API infrastructure and example kernels
Two new command line arguments:

  -D <num> device kernel implementation to use <0/1/2/3/4>
  -V <num> number of CTAs to launch device kernels with

Added new CTA Policy command line option:

  -x <policy> set the CTA Policy <0/1/2>
2025-09-04 17:23:22 -07:00
David Addison
c2cb96faac Update NVCUFLAGS and CXXFLAGS to use -std=c++14 2025-08-29 14:55:31 -07:00
David Addison
f2015cbe82 Modified warmup to run for more message sizes
Loops between minBytes and maxBytes doubling size each time

Reduced default warmup iteration count to 1 (was 5)
2025-08-25 13:57:51 -07:00
David Addison
fae7cb4727
Merge pull request #316 from martin-belanger/print-program-name
Print the name of the program being executed before and after test output
2025-07-24 14:58:54 -07:00
David Addison
6edafa0a9c Add extra reserved space during maxBytes calculation
Also, don't allow minBytes > maxBytes
2025-07-23 16:19:37 -07:00
David Addison
def2d3689c Minor fix to Makefile
Move comments to separate lines
2025-07-23 16:04:30 -07:00
David Addison
97ee098516 Add Turing (SM75) support to CUDA 13.0 builds 2025-06-04 17:54:58 -07:00
David Addison
e7c8825b0b Wrap ncclCommWindowRegister() calls within ncclGroup 2025-06-03 10:36:53 -07:00
Martin Belanger
dafb70408d Print the name of the program being executed
One thing missing from the stdout of each performance test is
the name of the test that is actually being run.

This patch adds 2 new messages to the stdout. At the beginning
of the execution of a test (e.g. sendrecv_perf) we will now
see this message:

  Collective test starting: sendrecv_perf

And at the end, we will now see this:

  Collective test concluded: sendrecv_perf

This is needed when running several tests consecutively and we're
trying to parse the stdout to collect the results.

For example, using a Python script to parse the stdout, one could
retrieve the results for each test and plot them on a graph. This
patch makes it easier to implement such a script.

Signed-off-by: Martin Belanger <martin.belanger@dell.com>
2025-06-03 11:43:02 -04:00
David Addison
5290298ab6 Reinstate Pascal suppport for CUDA 12.8+ builds 2025-06-02 09:29:52 -07:00
David Addison
8bc16f4e01 Need to drop Volta (sm_70) support from CUDA 13.0 2025-05-30 18:04:25 -07:00
David Addison
0c60e6a8e4 Fix formatting errors in README.md 2025-05-30 17:43:30 -07:00