nccl-tests

mirror of https://github.com/NVIDIA/nccl-tests.git synced 2026-02-04 02:01:05 +08:00

Author	SHA1	Message	Date
David Addison	cb6a46fdd6	Update CUDA gencodes Add support for Blackwell sm100 and sm120 from CUDA 12.8 Add support for Hopper sm90 from CUDA 12.0	2025-01-25 17:32:16 -08:00
John Bachan	29f4114f02	Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes. This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.	2024-12-18 11:20:28 -08:00
Sylvain Jeaugey	8dfeab9eb9	Merge pull request #259 from NVIDIA/fix-ncclstringtotype Future-proof ncclstringtotype	2024-10-24 10:28:02 -07:00
Kamil Iskra	34d6d53910	Future-proof ncclstringtotype Ensure that ncclstringtotype iterates only over data types known to nccl-tests (as indicated by test_typenum), not over a potentially larger set of all NCCL types.	2024-10-24 09:21:37 -07:00
David Addison	9d26b8422b	Merge pull request #226 from netgroup/master improve parsing of stepbytes (increment size) argument	2024-07-30 14:58:54 -07:00
David Addison	0d86b5a6e7	Added some missing command line options to README.md Also updated single and multi-node examples.	2024-07-30 14:50:45 -07:00
David Addison	d2d40cc824	Added -N,--run_cycles option	2024-07-25 22:00:23 -07:00
David Addison	3a3f790efd	Merge pull request #240 from OrenLeung/patch-1 doc: add all2all factor	2024-07-25 22:00:06 -07:00
Oren	c6eb15875f	doc: add all2all factor	2024-07-24 22:55:00 -04:00
Stefano Salsano	746549b28d	improve parsing of stepbytes (increment size) argument	2024-06-14 11:28:55 +02:00
Kaiming Ouyang	d028efcf35	Change ncclCommRegister size to maxBytes in serial comm init	2024-06-06 06:54:48 -07:00
Giuseppe Congiu	a1efb427e7	Add -R option to register user buffers	2024-06-03 01:04:58 -07:00
David Addison	c6afef0b6f	Added missing MPI_Comm_free() call before MPI_Finalize()	2024-02-05 08:53:54 -08:00
David Addison	1292b25553	Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue	2023-10-12 16:53:32 -07:00
David Addison	6c46206a47	Make the -c option be a datacheck iteration count parameter Default is 1	2023-09-13 14:03:38 -07:00
Sylvain Jeaugey	1a5f551ffd	Merge pull request #146 from yangxingwu/master makefile: remove extra space	2023-06-06 11:58:24 +02:00
yangxingwu	52ea1b2148	makefile: remove extra space	2023-06-06 09:47:50 +00:00
Sylvain Jeaugey	e98ef24bc0	Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling fix handling of variable NVCC.	2023-03-27 11:14:10 +02:00
alan.souza	7ccda3c97b	fix handling of variable NVCC. Permit overriding the variable using environment variables	2023-03-25 16:56:16 -03:00
David Addison	e76e36e9a9	Merge pull request #134 from flx42/patch-1 Update README.md to fix -i default increment value.	2023-03-23 09:53:15 -07:00
Felix Abecassis	17d0a42d5a	Update README.md	2023-03-23 09:05:41 -07:00
Sylvain Jeaugey	2cbb968101	Update README.md Improve MPI example to avoid confusion of number of processes / total number of GPUs. https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369	2023-01-03 08:47:43 +01:00
David Addison	0b4c4cb99f	Add boot_id to the hostname hash due to collisions on Azure Fixes #60	2022-12-12 01:16:46 -08:00
Jithin Jose	0aeba157db	Use DJB2a hash algorithm in getHostHash()	2022-12-12 01:16:38 -08:00
David Addison	24fcf64ed1	Call cudaFreeHost() on wrongPerGpu not cudaFree()	2022-11-22 11:18:37 -08:00
David Addison	3bd2bd292b	Add fflush(stdout) before perf output	2022-11-22 11:16:47 -08:00
Sylvain Jeaugey	365b92a1ea	Fix build on RHEL7 with GCC 4.8 Add -std=c++11 to CXXFLAGS. Fixes #116.	2022-10-12 01:24:14 -07:00
Sylvain Jeaugey	d313d20a26	Update NCCL tests	2022-09-23 01:13:29 -07:00
David Addison	749573f2d6	Fix preprocessor version check for ncclGetLastError() ncclGetLastError() was added in NCCL 2.13.0	2022-09-07 16:10:41 -07:00
David Addison	afa4c56b6a	Fix an issue with the last commit when data checking is disabled	2022-09-07 11:23:49 -07:00
David Addison	a0a14911ee	Display N/A for error count in AlltoAll in-place test AlltoAll does not support in-place buffers	2022-09-06 13:17:15 -07:00
John Bachan	bc5f7cfb0a	Changed top-level Makefile behavior so that BUILDDIR is interpreted as relative to top-level directory. This done is by abspath'ing it before passing it to subdirectory Makefile's. The old behavior had two cases: with and without BUILDDIR being set by the user. With BUILDDIR not set, the build dir would be named "build" in the top-level directory. If BUILDDIR was set, then the build dir would be placed at "src/${BUILDDIR}". The new behavior is simpler, if BUILDDIR is not set then it defaults to "build", and the directory holding the final build is always at just "${BUILDDIR}" in the top level.	2022-08-23 10:08:49 -07:00
John Bachan	51af5572bf	Resync with NCCL 2.13 * Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations.	2022-08-22 17:51:06 -07:00
David Addison	8274cb47b6	Merge pull request #96 from NVIDIA/nersc-linkage-fix Add option to statically link cudart	2022-05-26 16:54:44 -07:00
David Addison	de3ddbe261	Add option to statically link cudart Build with CUDARTLIB=cudart_static to remove dynamic linkage Also removed unused curand and nvToolsExt dependencies BUG 95	2021-11-10 10:02:41 -08:00
David Addison	7130fa6096	Add MPI_IBM build option	2021-10-25 16:30:57 -07:00
David Addison	f773748b46	Resync with NCCL 2.11 New operator: mulsum New test: gather	2021-09-17 09:02:45 -07:00
David Addison	1f8f541686	Add CUDA graph support only for CUDA 11.3 and later builds Fixes #90	2021-07-13 10:47:47 -07:00
David Addison	b9f90d12a9	Removed MPI_SUPPORT conditional compilation of average flag	2021-07-12 11:43:57 -07:00
David Addison	547e119d35	Fix issues with MPI_Allreduce and multi-threaded tests	2021-07-08 16:42:40 -07:00
David Addison	11cff17a04	Updated with new command line arguments	2021-07-06 16:27:45 -07:00
David Addison	f476f4a17a	Merge branch 'bfloat16'	2021-07-06 10:20:32 -07:00
David Addison	1dfc76eccc	Added new option to report average iteration time	2021-06-30 19:36:07 -07:00
David Addison	1ae8cdc315	Resync with changes in gitilab-master code	2021-06-30 13:16:04 -07:00
David Addison	44df0bf010	Merge pull request #88 from nzmsv/master Cleanup argument error handling and messages	2021-06-30 12:35:47 -07:00
David Addison	9dae3d3a37	Added new tests: scatter, sendrecv, hypercube	2021-06-28 16:49:10 -07:00
David Addison	e55ad3796d	Added support for CUDA graph capture/replay (-G)	2021-06-28 14:19:45 -07:00
David Addison	526eacadf7	Fixed formatting for bfloat16 support	2021-06-28 10:12:34 -07:00
David Addison	cde7e769c1	Add support for ncclAvg operation	2021-06-28 09:41:58 -07:00
Greg Inozemtsev	c4de829d91	Cleanup argument error handling and messages Add error checking for minbytes and maxbytes arguments Also accept lowercase literals when parsing size arguments and print errors and usage on stderr.	2021-06-04 21:47:40 +00:00

1 2

84 Commits