John Bachan
51af5572bf
Resync with NCCL 2.13
...
* Added "verifiable", a suite of kernels for generating and verifying reduction
input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
2022-08-22 17:51:06 -07:00
David Addison
8274cb47b6
Merge pull request #96 from NVIDIA/nersc-linkage-fix
...
Add option to statically link cudart
2022-05-26 16:54:44 -07:00
David Addison
de3ddbe261
Add option to statically link cudart
...
Build with CUDARTLIB=cudart_static to remove dynamic linkage
Also removed unused curand and nvToolsExt dependencies
BUG 95
2021-11-10 10:02:41 -08:00
David Addison
7130fa6096
Add MPI_IBM build option
2021-10-25 16:30:57 -07:00
David Addison
f773748b46
Resync with NCCL 2.11
...
New operator: mulsum
New test: gather
2021-09-17 09:02:45 -07:00
David Addison
1f8f541686
Add CUDA graph support only for CUDA 11.3 and later builds
...
Fixes #90
2021-07-13 10:47:47 -07:00
David Addison
b9f90d12a9
Removed MPI_SUPPORT conditional compilation of average flag
2021-07-12 11:43:57 -07:00
David Addison
547e119d35
Fix issues with MPI_Allreduce and multi-threaded tests
2021-07-08 16:42:40 -07:00
David Addison
11cff17a04
Updated with new command line arguments
2021-07-06 16:27:45 -07:00
David Addison
f476f4a17a
Merge branch 'bfloat16'
2021-07-06 10:20:32 -07:00
David Addison
1dfc76eccc
Added new option to report average iteration time
2021-06-30 19:36:07 -07:00
David Addison
1ae8cdc315
Resync with changes in gitilab-master code
2021-06-30 13:16:04 -07:00
David Addison
44df0bf010
Merge pull request #88 from nzmsv/master
...
Cleanup argument error handling and messages
2021-06-30 12:35:47 -07:00
David Addison
9dae3d3a37
Added new tests: scatter, sendrecv, hypercube
2021-06-28 16:49:10 -07:00
David Addison
e55ad3796d
Added support for CUDA graph capture/replay (-G)
2021-06-28 14:19:45 -07:00
David Addison
526eacadf7
Fixed formatting for bfloat16 support
2021-06-28 10:12:34 -07:00
David Addison
cde7e769c1
Add support for ncclAvg operation
2021-06-28 09:41:58 -07:00
Greg Inozemtsev
c4de829d91
Cleanup argument error handling and messages
...
Add error checking for minbytes and maxbytes arguments
Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.
2021-06-04 21:47:40 +00:00
Sylvain Jeaugey
e12c35d84b
Update PERFORMANCE.md
2021-05-27 09:12:52 -07:00
David Addison
e37545e491
Add support for new datatype: bfloat16
2021-03-15 17:13:35 -07:00
David Addison
0b30de583f
Merge pull request #67 from NVIDIA/big_buffers
...
Do not allocate memory for expected buffer if checking disabled
2021-02-04 09:24:09 -08:00
David Addison
7677f3f608
Do not allocate memory for expected buffer if checking disabled
...
This allows the tests to be run with larger buffers
2021-01-20 17:08:40 -08:00
David Addison
2f9bba9f20
Merge pull request #64 from NVIDIA/hosthash_boot_id
...
Add boot_id to the hostname hash due to collisions on Azure
2021-01-11 10:02:20 -08:00
David Addison
ae1ce98e69
Add boot_id to the hostname hash due to collisions on Azure
...
Fixes #60
2021-01-04 11:38:45 -08:00
Sylvain Jeaugey
464f038106
Merge pull request #61 from jithinjosepkl/master
...
Use DJB2a hash algorithm in getHostHash()
2020-12-18 10:39:43 -08:00
Jithin Jose
da67a81c8e
Use DJB2a hash algorithm in getHostHash()
2020-12-18 10:12:54 -08:00
Sylvain Jeaugey
bd0755c95c
Merge pull request #48 from NVIDIA/fix-makefile-typo
...
Fix typo in src/Makefile
2020-06-24 14:52:55 -07:00
Luke Yeager
afdaf59b3b
Fix typo in src/Makefile
2020-06-24 14:39:22 -07:00
Sylvain Jeaugey
b2603a2e85
Add gencode for CUDA11
2020-06-23 18:16:46 -07:00
Sylvain Jeaugey
ec1b5e22e6
Change all_gather/reduce_scatter algbw to match the documentation.
...
Fix #45 : All_gather and reduce_scatter algorithm bandwidth was
computed as time/count*(nranks-1) which is not consistent with the
way we compute it for other collectives.
This change makes algbw higher; busbw is unchanged.
2020-06-19 10:42:19 -07:00
Sylvain Jeaugey
07ac716c1a
Fix #47 : compilation error on NCCL<2.7
...
Return an error when trying to run alltoall test when compiled
against NCCL<2.7.
2020-06-18 15:02:51 -07:00
Sylvain Jeaugey
a7b304dde5
Merge pull request #46 from NVIDIA/p2p
...
Add alltoall perf test
2020-06-17 10:45:29 -07:00
Luke Yeager
af4fa0f4cf
Fix some memory leaks
2020-06-17 10:44:32 -07:00
Sylvain Jeaugey
7a833631b2
Remove sm_30
2020-06-15 08:54:21 -07:00
Sylvain Jeaugey
ba924dac95
Fix #43 : Add .gitignore for build dir
2020-06-03 15:10:38 -07:00
Sylvain Jeaugey
119a0ecf60
Add alltoall perf test
2020-03-17 12:00:19 -07:00
Sylvain Jeaugey
c864b73a27
Merge pull request #31 from wzamazon/fix_makefile
...
Add -L$(MPI_HOME)/lib64 to NVLDFLAGS
2020-01-06 10:38:40 -08:00
Wei Zhang
0f173234bb
Add -L$(MPI_HOME)/lib64 to NVLDFLAGS
...
In some cases, the MPI library is not in $(MPI_HOME)/lib but
in $(MPI_HOME)/lib64. For example, on RedHat like Linux system
(CentOS, Amazon Linux), and MPI is installed by yum or rpm.
Under such circumstance, the current make file will cause failure.
This patch address this issue by adding -L$(MPI_HOME)/lib64 to
NVLDFLAGS in src/Makefile.
Signed-off-by: Wei Zhang <wzam@amazon.com>
2019-12-16 16:18:22 -08:00
Sylvain Jeaugey
a2af1d959d
Update README.md
...
Checks are now fully local, no need to disable them at scale.
2019-10-10 10:51:05 -07:00
Sylvain Jeaugey
ca7a565236
Update README.md
2019-08-16 09:06:28 -07:00
David Addison
cbe7f65400
Resync all tests with test code from NCCL 2.4
...
Major rework to merge most of the changes from the NCCL internal
tests into the public ones
Added "-m <agg_iters>" operation aggregation option.
Data integrity checking is now much more performant at scale.
Startup times at scale are improved.
Test latency units are now displayed in usec.
2019-04-05 13:42:15 -07:00
Sylvain Jeaugey
dcf818955f
Added a precision for AllGather and ReduceScatter sizes since NCCL uses the size per rank.
2018-08-17 14:58:44 -07:00
Sylvain Jeaugey
eb4c43ff3d
Clarification
2018-01-30 09:17:29 -08:00
Sylvain Jeaugey
e00cb1f1c4
Typos/Clarifications
2018-01-30 09:15:58 -08:00
Sylvain Jeaugey
db39a88f8a
Fix link to performance page
2018-01-30 09:14:49 -08:00
Sylvain Jeaugey
222f94f949
Added explanation about performance numbers
2018-01-30 09:13:52 -08:00
Sylvain Jeaugey
925a70576e
Print NCCL version at start
2017-12-21 15:10:09 -08:00
Sylvain Jeaugey
25016c8eeb
Fix NCCL_HOME to be consistent with README
2017-08-09 10:41:31 -07:00
Sylvain Jeaugey
9ec3e35276
Fix typo in Readme
2017-08-08 16:29:25 -07:00
Sylvain Jeaugey
a15599f5cf
Improve Readme
2017-08-08 16:28:46 -07:00