Kaiming Ouyang
59072b7e3d
Add symmetric registration support
...
-R 2 will enable symmetric registration
2025-03-14 17:04:06 -07:00
David Addison
b4300cc79d
Add PCI domain and device ID for GPU device BDF display
2025-02-28 13:25:51 -08:00
Sylvain Jeaugey
903918fc54
Add NCCL_TESTS_SPLIT documentation in the README
2025-02-06 14:10:07 +01:00
Junyu Ma
a89cf07fe8
Perftests: Introduce NCCL_TESTS_SPLIT env
...
`NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators.
Will be overrided by `NCCL_TESTS_SPLIT_MASK`.
Examples:
NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node.
NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7
NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system.
NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72.
You can also use: "%" "&" "|" "/" for short.
Extra spaces in the middle will be automatically ignored.
Not case sensitive.
The followings are all equivalent:
NCCL_TESTS_SPLIT="%0x7"
NCCL_TESTS_SPLIT="%0b111"
NCCL_TESTS_SPLIT="AND 7"
NCCL_TESTS_SPLIT="and 0x7"
2025-02-04 15:18:09 -08:00
David Addison
cb6a46fdd6
Update CUDA gencodes
...
Add support for Blackwell sm100 and sm120 from CUDA 12.8
Add support for Hopper sm90 from CUDA 12.0
2025-01-25 17:32:16 -08:00
John Bachan
29f4114f02
Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes.
...
This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.
2024-12-18 11:20:28 -08:00
Sylvain Jeaugey
8dfeab9eb9
Merge pull request #259 from NVIDIA/fix-ncclstringtotype
...
Future-proof ncclstringtotype
2024-10-24 10:28:02 -07:00
Kamil Iskra
34d6d53910
Future-proof ncclstringtotype
...
Ensure that ncclstringtotype iterates only over data types known to
nccl-tests (as indicated by test_typenum), not over a potentially larger
set of all NCCL types.
2024-10-24 09:21:37 -07:00
David Addison
9d26b8422b
Merge pull request #226 from netgroup/master
...
improve parsing of stepbytes (increment size) argument
2024-07-30 14:58:54 -07:00
David Addison
0d86b5a6e7
Added some missing command line options to README.md
...
Also updated single and multi-node examples.
2024-07-30 14:50:45 -07:00
David Addison
d2d40cc824
Added -N,--run_cycles option
2024-07-25 22:00:23 -07:00
David Addison
3a3f790efd
Merge pull request #240 from OrenLeung/patch-1
...
doc: add all2all factor
2024-07-25 22:00:06 -07:00
Oren
c6eb15875f
doc: add all2all factor
2024-07-24 22:55:00 -04:00
Stefano Salsano
746549b28d
improve parsing of stepbytes (increment size) argument
2024-06-14 11:28:55 +02:00
Kaiming Ouyang
d028efcf35
Change ncclCommRegister size to maxBytes in serial comm init
2024-06-06 06:54:48 -07:00
Giuseppe Congiu
a1efb427e7
Add -R option to register user buffers
2024-06-03 01:04:58 -07:00
David Addison
c6afef0b6f
Added missing MPI_Comm_free() call before MPI_Finalize()
2024-02-05 08:53:54 -08:00
David Addison
1292b25553
Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue
2023-10-12 16:53:32 -07:00
David Addison
6c46206a47
Make the -c option be a datacheck iteration count parameter
...
Default is 1
2023-09-13 14:03:38 -07:00
Sylvain Jeaugey
1a5f551ffd
Merge pull request #146 from yangxingwu/master
...
makefile: remove extra space
2023-06-06 11:58:24 +02:00
yangxingwu
52ea1b2148
makefile: remove extra space
2023-06-06 09:47:50 +00:00
Sylvain Jeaugey
e98ef24bc0
Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling
...
fix handling of variable NVCC.
2023-03-27 11:14:10 +02:00
alan.souza
7ccda3c97b
fix handling of variable NVCC. Permit overriding the variable using environment variables
2023-03-25 16:56:16 -03:00
David Addison
e76e36e9a9
Merge pull request #134 from flx42/patch-1
...
Update README.md to fix -i default increment value.
2023-03-23 09:53:15 -07:00
Felix Abecassis
17d0a42d5a
Update README.md
2023-03-23 09:05:41 -07:00
Sylvain Jeaugey
2cbb968101
Update README.md
...
Improve MPI example to avoid confusion of number of processes / total number of GPUs.
https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369
2023-01-03 08:47:43 +01:00
David Addison
0b4c4cb99f
Add boot_id to the hostname hash due to collisions on Azure
...
Fixes #60
2022-12-12 01:16:46 -08:00
Jithin Jose
0aeba157db
Use DJB2a hash algorithm in getHostHash()
2022-12-12 01:16:38 -08:00
David Addison
24fcf64ed1
Call cudaFreeHost() on wrongPerGpu not cudaFree()
2022-11-22 11:18:37 -08:00
David Addison
3bd2bd292b
Add fflush(stdout) before perf output
2022-11-22 11:16:47 -08:00
Sylvain Jeaugey
365b92a1ea
Fix build on RHEL7 with GCC 4.8
...
Add -std=c++11 to CXXFLAGS.
Fixes #116 .
2022-10-12 01:24:14 -07:00
Sylvain Jeaugey
d313d20a26
Update NCCL tests
2022-09-23 01:13:29 -07:00
David Addison
749573f2d6
Fix preprocessor version check for ncclGetLastError()
...
ncclGetLastError() was added in NCCL 2.13.0
2022-09-07 16:10:41 -07:00
David Addison
afa4c56b6a
Fix an issue with the last commit when data checking is disabled
2022-09-07 11:23:49 -07:00
David Addison
a0a14911ee
Display N/A for error count in AlltoAll in-place test
...
AlltoAll does not support in-place buffers
2022-09-06 13:17:15 -07:00
John Bachan
bc5f7cfb0a
Changed top-level Makefile behavior so that BUILDDIR is interpreted
...
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.
The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".
The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
2022-08-23 10:08:49 -07:00
John Bachan
51af5572bf
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00
David Addison
8274cb47b6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
...
Add option to statically link cudart
2022-05-26 16:54:44 -07:00
David Addison
de3ddbe261
Add option to statically link cudart
...
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
2021-11-10 10:02:41 -08:00
David Addison
7130fa6096
Add MPI_IBM build option
2021-10-25 16:30:57 -07:00
David Addison
f773748b46
Resync with NCCL 2.11
...
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
David Addison
1f8f541686
Add CUDA graph support only for CUDA 11.3 and later builds
...
Fixes #90
2021-07-13 10:47:47 -07:00
David Addison
b9f90d12a9
Removed MPI_SUPPORT conditional compilation of average flag
2021-07-12 11:43:57 -07:00
David Addison
547e119d35
Fix issues with MPI_Allreduce and multi-threaded tests
2021-07-08 16:42:40 -07:00
David Addison
11cff17a04
Updated with new command line arguments
2021-07-06 16:27:45 -07:00
David Addison
f476f4a17a
Merge branch 'bfloat16'
2021-07-06 10:20:32 -07:00
David Addison
1dfc76eccc
Added new option to report average iteration time
2021-06-30 19:36:07 -07:00
David Addison
1ae8cdc315
Resync with changes in gitilab-master code
2021-06-30 13:16:04 -07:00
David Addison
44df0bf010
Merge pull request #88 from nzmsv/master
...
Cleanup argument error handling and messages
2021-06-30 12:35:47 -07:00
David Addison
9dae3d3a37
Added new tests: scatter, sendrecv, hypercube
2021-06-28 16:49:10 -07:00