nccl-tests

mirror of https://github.com/NVIDIA/nccl-tests.git synced 2026-01-14 02:47:21 +08:00

Author	SHA1	Message	Date
Kaiming Ouyang	59072b7e3d	Add symmetric registration support -R 2 will enable symmetric registration	2025-03-14 17:04:06 -07:00
David Addison	b4300cc79d	Add PCI domain and device ID for GPU device BDF display	2025-02-28 13:25:51 -08:00
Sylvain Jeaugey	903918fc54	Add NCCL_TESTS_SPLIT documentation in the README	2025-02-06 14:10:07 +01:00
Junyu Ma	a89cf07fe8	Perftests: Introduce NCCL_TESTS_SPLIT env `NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators. Will be overrided by `NCCL_TESTS_SPLIT_MASK`. Examples: NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node. NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7 NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system. NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72. You can also use: "%" "&" "\|" "/" for short. Extra spaces in the middle will be automatically ignored. Not case sensitive. The followings are all equivalent: NCCL_TESTS_SPLIT="%0x7" NCCL_TESTS_SPLIT="%0b111" NCCL_TESTS_SPLIT="AND 7" NCCL_TESTS_SPLIT="and 0x7"	2025-02-04 15:18:09 -08:00
David Addison	cb6a46fdd6	Update CUDA gencodes Add support for Blackwell sm100 and sm120 from CUDA 12.8 Add support for Hopper sm90 from CUDA 12.0	2025-01-25 17:32:16 -08:00
John Bachan	29f4114f02	Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes. This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.	2024-12-18 11:20:28 -08:00
Sylvain Jeaugey	8dfeab9eb9	Merge pull request #259 from NVIDIA/fix-ncclstringtotype Future-proof ncclstringtotype	2024-10-24 10:28:02 -07:00
Kamil Iskra	34d6d53910	Future-proof ncclstringtotype Ensure that ncclstringtotype iterates only over data types known to nccl-tests (as indicated by test_typenum), not over a potentially larger set of all NCCL types.	2024-10-24 09:21:37 -07:00
David Addison	9d26b8422b	Merge pull request #226 from netgroup/master improve parsing of stepbytes (increment size) argument	2024-07-30 14:58:54 -07:00
David Addison	0d86b5a6e7	Added some missing command line options to README.md Also updated single and multi-node examples.	2024-07-30 14:50:45 -07:00
David Addison	d2d40cc824	Added -N,--run_cycles option	2024-07-25 22:00:23 -07:00
David Addison	3a3f790efd	Merge pull request #240 from OrenLeung/patch-1 doc: add all2all factor	2024-07-25 22:00:06 -07:00
Oren	c6eb15875f	doc: add all2all factor	2024-07-24 22:55:00 -04:00
Stefano Salsano	746549b28d	improve parsing of stepbytes (increment size) argument	2024-06-14 11:28:55 +02:00
Kaiming Ouyang	d028efcf35	Change ncclCommRegister size to maxBytes in serial comm init	2024-06-06 06:54:48 -07:00
Giuseppe Congiu	a1efb427e7	Add -R option to register user buffers	2024-06-03 01:04:58 -07:00
David Addison	c6afef0b6f	Added missing MPI_Comm_free() call before MPI_Finalize()	2024-02-05 08:53:54 -08:00
David Addison	1292b25553	Added an MPI_Barrier() call after MPI_Bcast() for HCOLL issue	2023-10-12 16:53:32 -07:00
David Addison	6c46206a47	Make the -c option be a datacheck iteration count parameter Default is 1	2023-09-13 14:03:38 -07:00
Sylvain Jeaugey	1a5f551ffd	Merge pull request #146 from yangxingwu/master makefile: remove extra space	2023-06-06 11:58:24 +02:00
yangxingwu	52ea1b2148	makefile: remove extra space	2023-06-06 09:47:50 +00:00
Sylvain Jeaugey	e98ef24bc0	Merge pull request #135 from aavbsouza/fix_nvcc_variable_handling fix handling of variable NVCC.	2023-03-27 11:14:10 +02:00
alan.souza	7ccda3c97b	fix handling of variable NVCC. Permit overriding the variable using environment variables	2023-03-25 16:56:16 -03:00
David Addison	e76e36e9a9	Merge pull request #134 from flx42/patch-1 Update README.md to fix -i default increment value.	2023-03-23 09:53:15 -07:00
Felix Abecassis	17d0a42d5a	Update README.md	2023-03-23 09:05:41 -07:00
Sylvain Jeaugey	2cbb968101	Update README.md Improve MPI example to avoid confusion of number of processes / total number of GPUs. https://github.com/NVIDIA/nccl-tests/issues/54#issuecomment-1212023369	2023-01-03 08:47:43 +01:00
David Addison	0b4c4cb99f	Add boot_id to the hostname hash due to collisions on Azure Fixes #60	2022-12-12 01:16:46 -08:00
Jithin Jose	0aeba157db	Use DJB2a hash algorithm in getHostHash()	2022-12-12 01:16:38 -08:00
David Addison	24fcf64ed1	Call cudaFreeHost() on wrongPerGpu not cudaFree()	2022-11-22 11:18:37 -08:00
David Addison	3bd2bd292b	Add fflush(stdout) before perf output	2022-11-22 11:16:47 -08:00
Sylvain Jeaugey	365b92a1ea	Fix build on RHEL7 with GCC 4.8 Add -std=c++11 to CXXFLAGS. Fixes #116.	2022-10-12 01:24:14 -07:00
Sylvain Jeaugey	d313d20a26	Update NCCL tests	2022-09-23 01:13:29 -07:00
David Addison	749573f2d6	Fix preprocessor version check for ncclGetLastError() ncclGetLastError() was added in NCCL 2.13.0	2022-09-07 16:10:41 -07:00
David Addison	afa4c56b6a	Fix an issue with the last commit when data checking is disabled	2022-09-07 11:23:49 -07:00
David Addison	a0a14911ee	Display N/A for error count in AlltoAll in-place test AlltoAll does not support in-place buffers	2022-09-06 13:17:15 -07:00
John Bachan	bc5f7cfb0a	Changed top-level Makefile behavior so that BUILDDIR is interpreted as relative to top-level directory. This done is by abspath'ing it before passing it to subdirectory Makefile's. The old behavior had two cases: with and without BUILDDIR being set by the user. With BUILDDIR not set, the build dir would be named "build" in the top-level directory. If BUILDDIR was set, then the build dir would be placed at "src/${BUILDDIR}". The new behavior is simpler, if BUILDDIR is not set then it defaults to "build", and the directory holding the final build is always at just "${BUILDDIR}" in the top level.	2022-08-23 10:08:49 -07:00
John Bachan	51af5572bf	Resync with NCCL 2.13 * Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations.	2022-08-22 17:51:06 -07:00
David Addison	8274cb47b6	Merge pull request #96 from NVIDIA/nersc-linkage-fix Add option to statically link cudart	2022-05-26 16:54:44 -07:00
David Addison	de3ddbe261	Add option to statically link cudart Build with CUDARTLIB=cudart_static to remove dynamic linkage Also removed unused curand and nvToolsExt dependencies BUG 95	2021-11-10 10:02:41 -08:00
David Addison	7130fa6096	Add MPI_IBM build option	2021-10-25 16:30:57 -07:00
David Addison	f773748b46	Resync with NCCL 2.11 New operator: mulsum New test: gather	2021-09-17 09:02:45 -07:00
David Addison	1f8f541686	Add CUDA graph support only for CUDA 11.3 and later builds Fixes #90	2021-07-13 10:47:47 -07:00
David Addison	b9f90d12a9	Removed MPI_SUPPORT conditional compilation of average flag	2021-07-12 11:43:57 -07:00
David Addison	547e119d35	Fix issues with MPI_Allreduce and multi-threaded tests	2021-07-08 16:42:40 -07:00
David Addison	11cff17a04	Updated with new command line arguments	2021-07-06 16:27:45 -07:00
David Addison	f476f4a17a	Merge branch 'bfloat16'	2021-07-06 10:20:32 -07:00
David Addison	1dfc76eccc	Added new option to report average iteration time	2021-06-30 19:36:07 -07:00
David Addison	1ae8cdc315	Resync with changes in gitilab-master code	2021-06-30 13:16:04 -07:00
David Addison	44df0bf010	Merge pull request #88 from nzmsv/master Cleanup argument error handling and messages	2021-06-30 12:35:47 -07:00
David Addison	9dae3d3a37	Added new tests: scatter, sendrecv, hypercube	2021-06-28 16:49:10 -07:00

1 2

88 Commits