David Addison
cb6a46fdd6
Update CUDA gencodes
...
Add support for Blackwell sm100 and sm120 from CUDA 12.8
Add support for Hopper sm90 from CUDA 12.0
2025-01-25 17:32:16 -08:00
John Bachan
29f4114f02
Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes.
...
This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.
2024-12-18 11:20:28 -08:00
Sylvain Jeaugey
8dfeab9eb9
Merge pull request #259 from NVIDIA/fix-ncclstringtotype
...
Future-proof ncclstringtotype
2024-10-24 10:28:02 -07:00
Kamil Iskra
34d6d53910
Future-proof ncclstringtotype
...
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.
2024-10-24 09:21:37 -07:00
David Addison
9d26b8422b
Merge pull request #226 from netgroup/master
...
improve parsing of stepbytes (increment size) argument
2024-07-30 14:58:54 -07:00
David Addison
0d86b5a6e7
Added some missing command line options to README.md
...
Also updated single and multi-node examples.
2024-07-30 14:50:45 -07:00
David Addison
d2d40cc824
Added -N,--run_cycles option
2024-07-25 22:00:23 -07:00
David Addison
3a3f790efd
Merge pull request #240 from OrenLeung/patch-1
...
doc: add all2all factor
2024-07-25 22:00:06 -07:00
Oren
c6eb15875f
doc: add all2all factor
2024-07-24 22:55:00 -04:00
Stefano Salsano
746549b28d
improve parsing of stepbytes (increment size) argument
2024-06-14 11:28:55 +02:00
Kaiming Ouyang
d028efcf35
Change ncclCommRegister size to maxBytes in serial comm init
2024-06-06 06:54:48 -07:00
Giuseppe Congiu
a1efb427e7
Add -R option to register user buffers
2024-06-03 01:04:58 -07:00
David Addison
c6afef0b6f
Added missing MPI_Comm_free() call before MPI_Finalize()
2024-02-05 08:53:54 -08:00
David Addison
1292b25553
Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue
2023-10-12 16:53:32 -07:00
David Addison
6c46206a47
Make the -c option be a datacheck iteration count parameter
...
Default is 1
2023-09-13 14:03:38 -07:00
Sylvain Jeaugey
1a5f551ffd
Merge pull request #146 from yangxingwu/master
...
makefile: remove extra space
2023-06-06 11:58:24 +02:00
yangxingwu
52ea1b2148
makefile: remove extra space
2023-06-06 09:47:50 +00:00
Sylvain Jeaugey
e98ef24bc0
Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling
...
fix handling of variable NVCC.
2023-03-27 11:14:10 +02:00
alan.souza
7ccda3c97b
fix handling of variable NVCC. Permit overriding the variable using environment variables
2023-03-25 16:56:16 -03:00
David Addison
e76e36e9a9
Merge pull request #134 from flx42/patch-1
...
Update README.md to fix -i default increment value.
2023-03-23 09:53:15 -07:00
Felix Abecassis
17d0a42d5a
Update README.md
2023-03-23 09:05:41 -07:00
Sylvain Jeaugey
2cbb968101
Update README.md
...
Improve MPI example to avoid confusion of number of processes / total number of GPUs.
https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
2023-01-03 08:47:43 +01:00
David Addison
0b4c4cb99f
Add boot_id to the hostname hash due to collisions on Azure
...
Fixes #60
2022-12-12 01:16:46 -08:00
Jithin Jose
0aeba157db
Use DJB2a hash algorithm in getHostHash()
2022-12-12 01:16:38 -08:00
David Addison
24fcf64ed1
Call cudaFreeHost() on wrongPerGpu not cudaFree()
2022-11-22 11:18:37 -08:00
David Addison
3bd2bd292b
Add fflush(stdout) before perf output
2022-11-22 11:16:47 -08:00
Sylvain Jeaugey
365b92a1ea
Fix build on RHEL7 with GCC 4.8
...
Add -std=c++11 to CXXFLAGS.
Fixes #116 .
2022-10-12 01:24:14 -07:00
Sylvain Jeaugey
d313d20a26
Update NCCL tests
2022-09-23 01:13:29 -07:00
David Addison
749573f2d6
Fix preprocessor version check for ncclGetLastError()
...
ncclGetLastError() was added in NCCL 2.13.0
2022-09-07 16:10:41 -07:00
David Addison
afa4c56b6a
Fix an issue with the last commit when data checking is disabled
2022-09-07 11:23:49 -07:00
David Addison
a0a14911ee
Display N/A for error count in AlltoAll in-place test
...
AlltoAll does not support in-place buffers
2022-09-06 13:17:15 -07:00
John Bachan
bc5f7cfb0a
Changed top-level Makefile behavior so that BUILDDIR is interpreted
...
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.
The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".
The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
2022-08-23 10:08:49 -07:00
John Bachan
51af5572bf
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00
David Addison
8274cb47b6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
...
Add option to statically link cudart
2022-05-26 16:54:44 -07:00
David Addison
de3ddbe261
Add option to statically link cudart
...
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
2021-11-10 10:02:41 -08:00
David Addison
7130fa6096
Add MPI_IBM build option
2021-10-25 16:30:57 -07:00
David Addison
f773748b46
Resync with NCCL 2.11
...
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
David Addison
1f8f541686
Add CUDA graph support only for CUDA 11.3 and later builds
...
Fixes #90
2021-07-13 10:47:47 -07:00
David Addison
b9f90d12a9
Removed MPI_SUPPORT conditional compilation of average flag
2021-07-12 11:43:57 -07:00
David Addison
547e119d35
Fix issues with MPI_Allreduce and multi-threaded tests
2021-07-08 16:42:40 -07:00
David Addison
11cff17a04
Updated with new command line arguments
2021-07-06 16:27:45 -07:00
David Addison
f476f4a17a
Merge branch 'bfloat16'
2021-07-06 10:20:32 -07:00
David Addison
1dfc76eccc
Added new option to report average iteration time
2021-06-30 19:36:07 -07:00
David Addison
1ae8cdc315
Resync with changes in gitilab-master code
2021-06-30 13:16:04 -07:00
David Addison
44df0bf010
Merge pull request #88 from nzmsv/master
...
Cleanup argument error handling and messages
2021-06-30 12:35:47 -07:00
David Addison
9dae3d3a37
Added new tests: scatter, sendrecv, hypercube
2021-06-28 16:49:10 -07:00
David Addison
e55ad3796d
Added support for CUDA graph capture/replay (-G)
2021-06-28 14:19:45 -07:00
David Addison
526eacadf7
Fixed formatting for bfloat16 support
2021-06-28 10:12:34 -07:00
David Addison
cde7e769c1
Add support for ncclAvg operation
2021-06-28 09:41:58 -07:00
Greg Inozemtsev
c4de829d91
Cleanup argument error handling and messages
...
Add error checking for minbytes and maxbytes arguments
Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
2021-06-04 21:47:40 +00:00