nccl-tests

mirror of https://github.com/NVIDIA/nccl-tests.git synced 2026-01-13 18:37:16 +08:00

Author	SHA1	Message	Date
David Addison	81463c58d0	NCCL_TESTS_VERSION 2.17.8	2026-01-06 15:00:17 -08:00
David Addison	7278698c1b	Clarified use of Mebibytes and Gibibytes for sizes	2026-01-06 14:59:17 -08:00
Katie Gioioso	2656c58421	NCCL_TESTS_VERSION 2.17.7	2025-12-30 20:18:25 +00:00
Katie Gioioso	070d17528c	refactor comm init	2025-12-30 20:18:25 +00:00
Katie Gioioso	332e61896f	device api 2.28 is not compatible with 2.29. Check versions and print error if there is a mismatch	2025-12-30 20:18:25 +00:00
Katie Gioioso	24874bdaa8	Compatibility with 2.29 device API: use NCCL_DEV_COMM_REQUIREMENTS_INTIIALIZER, query properties to check for device api support	2025-12-30 20:18:24 +00:00
David Addison	7106245178	Add include of <limits> due to compilation error	2025-12-30 20:13:13 +00:00
David Addison	760c467f12	Add memory usage report option Use -M 1 to dump library memory usage information	2025-12-30 20:12:58 +00:00
David Addison	4bc314aa27	Add README.md text for -J option	2025-11-21 11:31:48 -08:00
David Addison	51f2e7ed7c	Remove trailing WS when timestamp option not used	2025-11-03 11:23:52 -08:00
David Addison	da0b547b1b	NCCL_TESTS_VERSION 2.17.6	2025-10-28 10:22:08 -07:00
David Addison	e2af90af76	Add new report_timestamps option to README.md	2025-10-28 10:21:58 -07:00
David Addison	a62c975681	Add option to suffix a timestamp to each perf line Based on code from yakovdyadkin & Scott Moe in MR 349 Adds -S 1 option to suffix each performance report line with a timestamp. Format is "%Y-%m-%d %H:%M:%S" This is especially useful when using the -N 0 option and looking for hangs or failure events.	2025-10-28 10:11:23 -07:00
David Addison	0bb567cc02	NCCL_TESTS_VERSION 2.17.5	2025-10-28 09:34:56 -07:00
Shane Snyder	013c49e930	add necessary ifdef guards for device API tests	2025-10-28 09:34:26 -07:00
Shane Snyder	f66d20e360	add runtime guards for ncclAlltoAll()	2025-10-28 09:32:17 -07:00
David Addison	3744121a2d	NCCL_TESTS_VERSION 2.17.4	2025-10-24 17:11:08 -07:00
David Addison	9641693e9b	Add PRINT of nccl-tests, NCCL header and library versions	2025-10-24 17:10:57 -07:00
Shane Snyder	9829ea42b5	add GIN-based device API kernels to alltoall - add GIN-only A2A kernel implementation - add hybrid LSA+GIN A2A kernel implementation - update perf test cases to expose a function for setting devCommRequirements for each device implementation and simplify devCommCreate code path to use this directly instead of complex fallback logic - add missing call to devCommDestroy	2025-10-24 17:10:34 -07:00
David Addison	00f52811b8	Add support for JSON output to perf test framework This adds support for writing structured information about the run to a JSON file. Enable with -J <filename>.json If the target JSON filename already exists then an incrementing numeric suffix will be added to create <filename>.json.<n>	2025-10-17 12:01:25 -07:00
Stephen Sachs	abc46770a9	Check if sufficient GPUs are available The CUDA error message "Test CUDA failure util.cu:706 'invalid device ordinal'" is not as helpful. Test this explicitly and guide the user.	2025-10-02 15:48:13 -07:00
Sylvain Jeaugey	9a5c15461a	Fix compilation for old NCCL versions Fix compilation failure on ctaPolicy with NCCL <= 2.26. Fix compilation failure on local_register with NCCL <= 2.18. Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26 but run with NCCL >= 2.27.	2025-09-05 09:15:06 -07:00
David Addison	e12dbb0a14	Update to align with the NCCL 2.28 release Added Device API infrastructure and example kernels Two new command line arguments: -D <num> device kernel implementation to use <0/1/2/3/4> -V <num> number of CTAs to launch device kernels with Added new CTA Policy command line option: -x <policy> set the CTA Policy <0/1/2>	2025-09-04 17:23:22 -07:00
David Addison	c2cb96faac	Update NVCUFLAGS and CXXFLAGS to use -std=c++14	2025-08-29 14:55:31 -07:00
David Addison	f2015cbe82	Modified warmup to run for more message sizes Loops between minBytes and maxBytes doubling size each time Reduced default warmup iteration count to 1 (was 5)	2025-08-25 13:57:51 -07:00
David Addison	fae7cb4727	Merge pull request #316 from martin-belanger/print-program-name Print the name of the program being executed before and after test output	2025-07-24 14:58:54 -07:00
David Addison	6edafa0a9c	Add extra reserved space during maxBytes calculation Also, don't allow minBytes > maxBytes	2025-07-23 16:19:37 -07:00
David Addison	def2d3689c	Minor fix to Makefile Move comments to separate lines	2025-07-23 16:04:30 -07:00
David Addison	97ee098516	Add Turing (SM75) support to CUDA 13.0 builds	2025-06-04 17:54:58 -07:00
David Addison	e7c8825b0b	Wrap ncclCommWindowRegister() calls within ncclGroup	2025-06-03 10:36:53 -07:00
Martin Belanger	dafb70408d	Print the name of the program being executed One thing missing from the stdout of each performance test is the name of the test that is actually being run. This patch adds 2 new messages to the stdout. At the beginning of the execution of a test (e.g. sendrecv_perf) we will now see this message: Collective test starting: sendrecv_perf And at the end, we will now see this: Collective test concluded: sendrecv_perf This is needed when running several tests consecutively and we're trying to parse the stdout to collect the results. For example, using a Python script to parse the stdout, one could retrieve the results for each test and plot them on a graph. This patch makes it easier to implement such a script. Signed-off-by: Martin Belanger <martin.belanger@dell.com>	2025-06-03 11:43:02 -04:00
David Addison	5290298ab6	Reinstate Pascal suppport for CUDA 12.8+ builds	2025-06-02 09:29:52 -07:00
David Addison	8bc16f4e01	Need to drop Volta (sm_70) support from CUDA 13.0	2025-05-30 18:04:25 -07:00
David Addison	0c60e6a8e4	Fix formatting errors in README.md	2025-05-30 17:43:30 -07:00
David Addison	a5c539e68b	Add support for Symmetric Memory Registration From NCCL 2.27.x we can now use the Symmetric Memory APIs (-R 2)	2025-05-30 17:31:34 -07:00
David Addison	e041d901e6	Re-add sm_70 support for CUDA 12.8+ and 13.0 builds	2025-05-07 10:30:59 -07:00
David Addison	1021260ca9	Make verifiable a DSO and add NAME_SUFFIX support Build option DSO=1 generates libverifiable.so which can be used to reduce the combined binary size. Build option NAME_SUFFIX can be used to a add suffix to all generated binaries. e.g. NAME_SUFFIX=_mpi Added new make target: clean_intermediates	2025-04-23 17:07:24 -07:00
David Addison	501a149d57	Add support for FP8 datatypes Added new datatypes: f8e4m3, f8e5m2 Only supported on H100+ architectures and NCCL versions >= 2.24.0	2025-04-18 19:20:59 -07:00
David Addison	b4300cc79d	Add PCI domain and device ID for GPU device BDF display	2025-02-28 13:25:51 -08:00
Sylvain Jeaugey	903918fc54	Add NCCL_TESTS_SPLIT documentation in the README	2025-02-06 14:10:07 +01:00
Junyu Ma	a89cf07fe8	Perftests: Introduce NCCL_TESTS_SPLIT env `NCCL_TESTS_SPLIT` serves as new way of computing the color for splitting communicators. Will be overrided by `NCCL_TESTS_SPLIT_MASK`. Examples: NCCL_TESTS_SPLIT_MASK="0x7" # color = rank & 0x7. What we do today to run on a DGX with one GPU per node. NCCL_TESTS_SPLIT="AND 0x7" # color = rank & 0x7. New way to run on one GPU per node on a DGX, equivalent to NCCL_TESTS_SPLIT_MASK=0x7 NCCL_TESTS_SPLIT="MOD 72" # color = rank % 72. One GPU per NVLink domain on an NVL72 system. NCCL_TESTS_SPLIT="DIV 72" # color = rank / 72. Intra NVLink domain on NVL72. You can also use: "%" "&" "\|" "/" for short. Extra spaces in the middle will be automatically ignored. Not case sensitive. The followings are all equivalent: NCCL_TESTS_SPLIT="%0x7" NCCL_TESTS_SPLIT="%0b111" NCCL_TESTS_SPLIT="AND 7" NCCL_TESTS_SPLIT="and 0x7"	2025-02-04 15:18:09 -08:00
David Addison	cb6a46fdd6	Update CUDA gencodes Add support for Blackwell sm100 and sm120 from CUDA 12.8 Add support for Hopper sm90 from CUDA 12.0	2025-01-25 17:32:16 -08:00
John Bachan	29f4114f02	Fixes to all tests that divide buffers by nranks so that they trim buffer sizes to be multiples of 16 bytes. This ensures non-pow2 ranks have buffer addresses aligned suitably for performance.	2024-12-18 11:20:28 -08:00
Sylvain Jeaugey	8dfeab9eb9	Merge pull request #259 from NVIDIA/fix-ncclstringtotype Future-proof ncclstringtotype	2024-10-24 10:28:02 -07:00
Kamil Iskra	34d6d53910	Future-proof ncclstringtotype Ensure that ncclstringtotype iterates only over data types known to nccl-tests (as indicated by test_typenum), not over a potentially larger set of all NCCL types.	2024-10-24 09:21:37 -07:00
David Addison	9d26b8422b	Merge pull request #226 from netgroup/master improve parsing of stepbytes (increment size) argument	2024-07-30 14:58:54 -07:00
David Addison	0d86b5a6e7	Added some missing command line options to README.md Also updated single and multi-node examples.	2024-07-30 14:50:45 -07:00
David Addison	d2d40cc824	Added -N,--run_cycles option	2024-07-25 22:00:23 -07:00
David Addison	3a3f790efd	Merge pull request #240 from OrenLeung/patch-1 doc: add all2all factor	2024-07-25 22:00:06 -07:00
Oren	c6eb15875f	doc: add all2all factor	2024-07-24 22:55:00 -04:00

1 2 3

125 Commits