Commit Graph

75 Commits

Author SHA1 Message Date
Oren
c6eb15875f
doc: add all2all factor 2024-07-24 22:55:00 -04:00
Kaiming Ouyang
d028efcf35 Change ncclCommRegister size to maxBytes in serial comm init 2024-06-06 06:54:48 -07:00
Giuseppe Congiu
a1efb427e7 Add -R option to register user buffers 2024-06-03 01:04:58 -07:00
David Addison
c6afef0b6f Added missing MPI_Comm_free() call before MPI_Finalize() 2024-02-05 08:53:54 -08:00
David Addison
1292b25553 Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue 2023-10-12 16:53:32 -07:00
David Addison
6c46206a47 Make the -c option be a datacheck iteration count parameter
Default is 1
2023-09-13 14:03:38 -07:00
Sylvain Jeaugey
1a5f551ffd
Merge pull request #146 from yangxingwu/master
makefile: remove extra space
2023-06-06 11:58:24 +02:00
yangxingwu
52ea1b2148 makefile: remove extra space 2023-06-06 09:47:50 +00:00
Sylvain Jeaugey
e98ef24bc0
Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling
fix handling of variable NVCC.
2023-03-27 11:14:10 +02:00
alan.souza
7ccda3c97b fix handling of variable NVCC. Permit overriding the variable using environment variables 2023-03-25 16:56:16 -03:00
David Addison
e76e36e9a9
Merge pull request #134 from flx42/patch-1
Update README.md to fix -i default increment value.
2023-03-23 09:53:15 -07:00
Felix Abecassis
17d0a42d5a
Update README.md 2023-03-23 09:05:41 -07:00
Sylvain Jeaugey
2cbb968101
Update README.md
Improve MPI example to avoid confusion of number of processes / total number of GPUs.

https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
2023-01-03 08:47:43 +01:00
David Addison
0b4c4cb99f Add boot_id to the hostname hash due to collisions on Azure
Fixes #60
2022-12-12 01:16:46 -08:00
Jithin Jose
0aeba157db Use DJB2a hash algorithm in getHostHash() 2022-12-12 01:16:38 -08:00
David Addison
24fcf64ed1 Call cudaFreeHost() on wrongPerGpu not cudaFree() 2022-11-22 11:18:37 -08:00
David Addison
3bd2bd292b Add fflush(stdout) before perf output 2022-11-22 11:16:47 -08:00
Sylvain Jeaugey
365b92a1ea Fix build on RHEL7 with GCC 4.8
Add -std=c++11 to CXXFLAGS.
Fixes #116.
2022-10-12 01:24:14 -07:00
Sylvain Jeaugey
d313d20a26 Update NCCL tests 2022-09-23 01:13:29 -07:00
David Addison
749573f2d6 Fix preprocessor version check for ncclGetLastError()
ncclGetLastError() was added in NCCL 2.13.0
2022-09-07 16:10:41 -07:00
David Addison
afa4c56b6a Fix an issue with the last commit when data checking is disabled 2022-09-07 11:23:49 -07:00
David Addison
a0a14911ee Display N/A for error count in AlltoAll in-place test
AlltoAll does not support in-place buffers
2022-09-06 13:17:15 -07:00
John Bachan
bc5f7cfb0a Changed top-level Makefile behavior so that BUILDDIR is interpreted
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.

The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".

The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
2022-08-23 10:08:49 -07:00
John Bachan
51af5572bf Resync with NCCL 2.13
* Added "verifiable", a suite of kernels for generating and verifying reduction
  input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
  deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00
David Addison
8274cb47b6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
Add option to statically link cudart
2022-05-26 16:54:44 -07:00
David Addison
de3ddbe261 Add option to statically link cudart
Build with CUDARTLIB=cudart_static to remove dynamic linkage

Also removed unused curand and nvToolsExt dependencies

BUG 95
2021-11-10 10:02:41 -08:00
David Addison
7130fa6096 Add MPI_IBM build option 2021-10-25 16:30:57 -07:00
David Addison
f773748b46 Resync with NCCL 2.11
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
David Addison
1f8f541686 Add CUDA graph support only for CUDA 11.3 and later builds
Fixes #90
2021-07-13 10:47:47 -07:00
David Addison
b9f90d12a9 Removed MPI_SUPPORT conditional compilation of average flag 2021-07-12 11:43:57 -07:00
David Addison
547e119d35 Fix issues with MPI_Allreduce and multi-threaded tests 2021-07-08 16:42:40 -07:00
David Addison
11cff17a04 Updated with new command line arguments 2021-07-06 16:27:45 -07:00
David Addison
f476f4a17a Merge branch 'bfloat16' 2021-07-06 10:20:32 -07:00
David Addison
1dfc76eccc Added new option to report average iteration time 2021-06-30 19:36:07 -07:00
David Addison
1ae8cdc315 Resync with changes in gitilab-master code 2021-06-30 13:16:04 -07:00
David Addison
44df0bf010
Merge pull request #88 from nzmsv/master
Cleanup argument error handling and messages
2021-06-30 12:35:47 -07:00
David Addison
9dae3d3a37 Added new tests: scatter, sendrecv, hypercube 2021-06-28 16:49:10 -07:00
David Addison
e55ad3796d Added support for CUDA graph capture/replay (-G) 2021-06-28 14:19:45 -07:00
David Addison
526eacadf7 Fixed formatting for bfloat16 support 2021-06-28 10:12:34 -07:00
David Addison
cde7e769c1 Add support for ncclAvg operation 2021-06-28 09:41:58 -07:00
Greg Inozemtsev
c4de829d91 Cleanup argument error handling and messages
Add error checking for minbytes and maxbytes arguments

Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
2021-06-04 21:47:40 +00:00
Sylvain Jeaugey
e12c35d84b
Update PERFORMANCE.md 2021-05-27 09:12:52 -07:00
David Addison
e37545e491 Add support for new datatype: bfloat16 2021-03-15 17:13:35 -07:00
David Addison
0b30de583f
Merge pull request #67 from NVIDIA/big_buffers
Do not allocate memory for expected buffer if checking disabled
2021-02-04 09:24:09 -08:00
David Addison
7677f3f608 Do not allocate memory for expected buffer if checking disabled
This allows the tests to be run with larger buffers
2021-01-20 17:08:40 -08:00
David Addison
2f9bba9f20
Merge pull request #64 from NVIDIA/hosthash_boot_id
Add boot_id to the hostname hash due to collisions on Azure
2021-01-11 10:02:20 -08:00
David Addison
ae1ce98e69 Add boot_id to the hostname hash due to collisions on Azure
Fixes #60
2021-01-04 11:38:45 -08:00
Sylvain Jeaugey
464f038106
Merge pull request #61 from jithinjosepkl/master
Use DJB2a hash algorithm in getHostHash()
2020-12-18 10:39:43 -08:00
Jithin Jose
da67a81c8e Use DJB2a hash algorithm in getHostHash() 2020-12-18 10:12:54 -08:00
Sylvain Jeaugey
bd0755c95c
Merge pull request #48 from NVIDIA/fix-makefile-typo
Fix typo in src/Makefile
2020-06-24 14:52:55 -07:00