nccl-tests

mirror of https://github.com/NVIDIA/nccl-tests.git synced 2026-04-23 16:08:20 +08:00

Author	SHA1	Message	Date
John Bachan	51af5572bf	Resync with NCCL 2.13 * Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations.	2022-08-22 17:51:06 -07:00
David Addison	8274cb47b6	Merge pull request #96 from NVIDIA/nersc-linkage-fix Add option to statically link cudart	2022-05-26 16:54:44 -07:00
David Addison	de3ddbe261	Add option to statically link cudart Build with CUDARTLIB=cudart_static to remove dynamic linkage Also removed unused curand and nvToolsExt dependencies BUG 95	2021-11-10 10:02:41 -08:00
David Addison	7130fa6096	Add MPI_IBM build option	2021-10-25 16:30:57 -07:00
David Addison	f773748b46	Resync with NCCL 2.11 New operator: mulsum New test: gather	2021-09-17 09:02:45 -07:00
David Addison	1f8f541686	Add CUDA graph support only for CUDA 11.3 and later builds Fixes #90	2021-07-13 10:47:47 -07:00
David Addison	b9f90d12a9	Removed MPI_SUPPORT conditional compilation of average flag	2021-07-12 11:43:57 -07:00
David Addison	547e119d35	Fix issues with MPI_Allreduce and multi-threaded tests	2021-07-08 16:42:40 -07:00
David Addison	11cff17a04	Updated with new command line arguments	2021-07-06 16:27:45 -07:00
David Addison	f476f4a17a	Merge branch 'bfloat16'	2021-07-06 10:20:32 -07:00
David Addison	1dfc76eccc	Added new option to report average iteration time	2021-06-30 19:36:07 -07:00
David Addison	1ae8cdc315	Resync with changes in gitilab-master code	2021-06-30 13:16:04 -07:00
David Addison	44df0bf010	Merge pull request #88 from nzmsv/master Cleanup argument error handling and messages	2021-06-30 12:35:47 -07:00
David Addison	9dae3d3a37	Added new tests: scatter, sendrecv, hypercube	2021-06-28 16:49:10 -07:00
David Addison	e55ad3796d	Added support for CUDA graph capture/replay (-G)	2021-06-28 14:19:45 -07:00
David Addison	526eacadf7	Fixed formatting for bfloat16 support	2021-06-28 10:12:34 -07:00
David Addison	cde7e769c1	Add support for ncclAvg operation	2021-06-28 09:41:58 -07:00
Greg Inozemtsev	c4de829d91	Cleanup argument error handling and messages Add error checking for minbytes and maxbytes arguments Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.	2021-06-04 21:47:40 +00:00
Sylvain Jeaugey	e12c35d84b	Update PERFORMANCE.md	2021-05-27 09:12:52 -07:00
David Addison	e37545e491	Add support for new datatype: bfloat16	2021-03-15 17:13:35 -07:00
David Addison	0b30de583f	Merge pull request #67 from NVIDIA/big_buffers Do not allocate memory for expected buffer if checking disabled	2021-02-04 09:24:09 -08:00
David Addison	7677f3f608	Do not allocate memory for expected buffer if checking disabled This allows the tests to be run with larger buffers	2021-01-20 17:08:40 -08:00
David Addison	2f9bba9f20	Merge pull request #64 from NVIDIA/hosthash_boot_id Add boot_id to the hostname hash due to collisions on Azure	2021-01-11 10:02:20 -08:00
David Addison	ae1ce98e69	Add boot_id to the hostname hash due to collisions on Azure Fixes #60	2021-01-04 11:38:45 -08:00
Sylvain Jeaugey	464f038106	Merge pull request #61 from jithinjosepkl/master Use DJB2a hash algorithm in getHostHash()	2020-12-18 10:39:43 -08:00
Jithin Jose	da67a81c8e	Use DJB2a hash algorithm in getHostHash()	2020-12-18 10:12:54 -08:00
Sylvain Jeaugey	bd0755c95c	Merge pull request #48 from NVIDIA/fix-makefile-typo Fix typo in src/Makefile	2020-06-24 14:52:55 -07:00
Luke Yeager	afdaf59b3b	Fix typo in src/Makefile	2020-06-24 14:39:22 -07:00
Sylvain Jeaugey	b2603a2e85	Add gencode for CUDA11	2020-06-23 18:16:46 -07:00
Sylvain Jeaugey	ec1b5e22e6	Change all_gather/reduce_scatter algbw to match the documentation. Fix #45 : All_gather and reduce_scatter algorithm bandwidth was computed as time/count*(nranks-1) which is not consistent with the way we compute it for other collectives. This change makes algbw higher; busbw is unchanged.	2020-06-19 10:42:19 -07:00
Sylvain Jeaugey	07ac716c1a	Fix #47 : compilation error on NCCL<2.7 Return an error when trying to run alltoall test when compiled against NCCL<2.7.	2020-06-18 15:02:51 -07:00
Sylvain Jeaugey	a7b304dde5	Merge pull request #46 from NVIDIA/p2p Add alltoall perf test	2020-06-17 10:45:29 -07:00
Luke Yeager	af4fa0f4cf	Fix some memory leaks	2020-06-17 10:44:32 -07:00
Sylvain Jeaugey	7a833631b2	Remove sm_30	2020-06-15 08:54:21 -07:00
Sylvain Jeaugey	ba924dac95	Fix #43 : Add .gitignore for build dir	2020-06-03 15:10:38 -07:00
Sylvain Jeaugey	119a0ecf60	Add alltoall perf test	2020-03-17 12:00:19 -07:00
Sylvain Jeaugey	c864b73a27	Merge pull request #31 from wzamazon/fix_makefile Add -L$(MPI_HOME)/lib64 to NVLDFLAGS	2020-01-06 10:38:40 -08:00
Wei Zhang	0f173234bb	Add -L$(MPI_HOME)/lib64 to NVLDFLAGS In some cases, the MPI library is not in $(MPI_HOME)/lib but in $(MPI_HOME)/lib64. For example, on RedHat like Linux system (CentOS, Amazon Linux), and MPI is installed by yum or rpm. Under such circumstance, the current make file will cause failure. This patch address this issue by adding -L$(MPI_HOME)/lib64 to NVLDFLAGS in src/Makefile. Signed-off-by: Wei Zhang <wzam@amazon.com>	2019-12-16 16:18:22 -08:00
Sylvain Jeaugey	a2af1d959d	Update README.md Checks are now fully local, no need to disable them at scale.	2019-10-10 10:51:05 -07:00
Sylvain Jeaugey	ca7a565236	Update README.md	2019-08-16 09:06:28 -07:00
David Addison	cbe7f65400	Resync all tests with test code from NCCL 2.4 Major rework to merge most of the changes from the NCCL internal tests into the public ones Added "-m <agg_iters>" operation aggregation option. Data integrity checking is now much more performant at scale. Startup times at scale are improved. Test latency units are now displayed in usec.	2019-04-05 13:42:15 -07:00
Sylvain Jeaugey	dcf818955f	Added a precision for AllGather and ReduceScatter sizes since NCCL uses the size per rank.	2018-08-17 14:58:44 -07:00
Sylvain Jeaugey	eb4c43ff3d	Clarification	2018-01-30 09:17:29 -08:00
Sylvain Jeaugey	e00cb1f1c4	Typos/Clarifications	2018-01-30 09:15:58 -08:00
Sylvain Jeaugey	db39a88f8a	Fix link to performance page	2018-01-30 09:14:49 -08:00
Sylvain Jeaugey	222f94f949	Added explanation about performance numbers	2018-01-30 09:13:52 -08:00
Sylvain Jeaugey	925a70576e	Print NCCL version at start	2017-12-21 15:10:09 -08:00
Sylvain Jeaugey	25016c8eeb	Fix NCCL_HOME to be consistent with README	2017-08-09 10:41:31 -07:00
Sylvain Jeaugey	9ec3e35276	Fix typo in Readme	2017-08-08 16:29:25 -07:00
Sylvain Jeaugey	a15599f5cf	Improve Readme	2017-08-08 16:28:46 -07:00

1 2

52 Commits