nccl-tests

mirror of https://github.com/NVIDIA/nccl-tests.git synced 2026-04-25 08:58:18 +08:00

Author	SHA1	Message	Date
David Addison	af1dcac92a	NCCL_TESTS_VERSION 2.18.2	2026-03-11 15:35:13 -07:00
David Addison	eb0d3d2a00	Display unalign setting in output	2026-03-11 15:05:54 -07:00
David Addison	e02c20b898	NCCL_TESTS_VERSION 2.18.1	2026-03-11 09:55:31 -07:00
David Addison	c1af7df1f3	Update -z option description in README.md	2026-03-11 09:54:53 -07:00
David Addison	ba52a70492	Allow blocking collectives without MPI_Barrier in timing loop	2026-03-11 09:36:53 -07:00
Theofilos Ioannis Manitaras	8d26b23319	Allocate buffers during thread initialization Signed-off-by: Theofilos Ioannis Manitaras <tmanitaras@nvidia.com>	2026-03-11 09:36:38 -07:00
David Addison	dd0bafd178	NCCL_TESTS_VERSION 2.18.0	2026-03-06 17:55:12 -08:00
David Addison	115fb09377	Add new unalign flag to README.md and update help text	2026-03-06 17:53:29 -08:00
David Addison	e986a6156c	Add -u <index> to force unaligned buffer addresses	2026-03-06 17:39:25 -08:00
David Addison	c379e19a71	NCCL_TESTS_VERSION 2.17.10	2026-03-05 15:35:13 -08:00
Ahsan Pervaiz	db221defdb	Request GIN to be explicitly enabled in all to all test Based on the changes in NCCL v2.29.3, update the alltoall test to either provide a ginConnectionType or set ginForceEnable to true. Signed-off-by: Ahsan Pervaiz <apervaiz@nvidia.com>	2026-03-05 15:34:19 -08:00
Marcin Malagowski	ae98985f55	Fix Clang compilation errors with VLA initialization Signed-off-by: David Addison <daddison@nvidia.com>	2026-02-09 10:38:44 -08:00
David Addison	9938d5a657	Fix compilation issues with latest NCCL release headers Add --extended-lambda to NVCUFLAGS	2026-02-04 16:43:20 -08:00
David Addison	2535da805b	NCCL_TESTS_VERSION 2.17.9	2026-02-03 11:04:48 -08:00
mykeduong	85ca91d1b1	Fix: corrected typos in the JSON output Signed-off-by: David Addison <daddison@nvidia.com>	2026-02-03 11:03:35 -08:00
David Addison	88d7e33207	Add -M memory report option to README.md	2026-01-15 13:32:55 -08:00
David Addison	81463c58d0	NCCL_TESTS_VERSION 2.17.8	2026-01-06 15:00:17 -08:00
David Addison	7278698c1b	Clarified use of Mebibytes and Gibibytes for sizes	2026-01-06 14:59:17 -08:00
Katie Gioioso	2656c58421	NCCL_TESTS_VERSION 2.17.7	2025-12-30 20:18:25 +00:00
Katie Gioioso	070d17528c	refactor comm init	2025-12-30 20:18:25 +00:00
Katie Gioioso	332e61896f	device api 2.28 is not compatible with 2.29. Check versions and print error if there is a mismatch	2025-12-30 20:18:25 +00:00
Katie Gioioso	24874bdaa8	Compatibility with 2.29 device API: use NCCL_DEV_COMM_REQUIREMENTS_INTIIALIZER, query properties to check for device api support	2025-12-30 20:18:24 +00:00
David Addison	7106245178	Add include of <limits> due to compilation error	2025-12-30 20:13:13 +00:00
David Addison	760c467f12	Add memory usage report option Use -M 1 to dump library memory usage information	2025-12-30 20:12:58 +00:00
David Addison	4bc314aa27	Add README.md text for -J option	2025-11-21 11:31:48 -08:00
David Addison	51f2e7ed7c	Remove trailing WS when timestamp option not used	2025-11-03 11:23:52 -08:00
David Addison	da0b547b1b	NCCL_TESTS_VERSION 2.17.6	2025-10-28 10:22:08 -07:00
David Addison	e2af90af76	Add new report_timestamps option to README.md	2025-10-28 10:21:58 -07:00
David Addison	a62c975681	Add option to suffix a timestamp to each perf line Based on code from yakovdyadkin & Scott Moe in MR 349 Adds -S 1 option to suffix each performance report line with a timestamp. Format is "%Y-%m-%d %H:%M:%S" This is especially useful when using the -N 0 option and looking for hangs or failure events.	2025-10-28 10:11:23 -07:00
David Addison	0bb567cc02	NCCL_TESTS_VERSION 2.17.5	2025-10-28 09:34:56 -07:00
Shane Snyder	013c49e930	add necessary ifdef guards for device API tests	2025-10-28 09:34:26 -07:00
Shane Snyder	f66d20e360	add runtime guards for ncclAlltoAll()	2025-10-28 09:32:17 -07:00
David Addison	3744121a2d	NCCL_TESTS_VERSION 2.17.4	2025-10-24 17:11:08 -07:00
David Addison	9641693e9b	Add PRINT of nccl-tests, NCCL header and library versions	2025-10-24 17:10:57 -07:00
Shane Snyder	9829ea42b5	add GIN-based device API kernels to alltoall - add GIN-only A2A kernel implementation - add hybrid LSA+GIN A2A kernel implementation - update perf test cases to expose a function for setting devCommRequirements for each device implementation and simplify devCommCreate code path to use this directly instead of complex fallback logic - add missing call to devCommDestroy	2025-10-24 17:10:34 -07:00
David Addison	00f52811b8	Add support for JSON output to perf test framework This adds support for writing structured information about the run to a JSON file. Enable with -J <filename>.json If the target JSON filename already exists then an incrementing numeric suffix will be added to create <filename>.json.<n>	2025-10-17 12:01:25 -07:00
Stephen Sachs	abc46770a9	Check if sufficient GPUs are available The CUDA error message "Test CUDA failure util.cu:706 'invalid device ordinal'" is not as helpful. Test this explicitly and guide the user.	2025-10-02 15:48:13 -07:00
Sylvain Jeaugey	9a5c15461a	Fix compilation for old NCCL versions Fix compilation failure on ctaPolicy with NCCL <= 2.26. Fix compilation failure on local_register with NCCL <= 2.18. Fix ctaPolicy behavior if the tests are compiled with NCCL <= 2.26 but run with NCCL >= 2.27.	2025-09-05 09:15:06 -07:00
David Addison	e12dbb0a14	Update to align with the NCCL 2.28 release Added Device API infrastructure and example kernels Two new command line arguments: -D <num> device kernel implementation to use <0/1/2/3/4> -V <num> number of CTAs to launch device kernels with Added new CTA Policy command line option: -x <policy> set the CTA Policy <0/1/2>	2025-09-04 17:23:22 -07:00
David Addison	c2cb96faac	Update NVCUFLAGS and CXXFLAGS to use -std=c++14	2025-08-29 14:55:31 -07:00
David Addison	f2015cbe82	Modified warmup to run for more message sizes Loops between minBytes and maxBytes doubling size each time Reduced default warmup iteration count to 1 (was 5)	2025-08-25 13:57:51 -07:00
David Addison	fae7cb4727	Merge pull request #316 from martin-belanger/print-program-name Print the name of the program being executed before and after test output	2025-07-24 14:58:54 -07:00
David Addison	6edafa0a9c	Add extra reserved space during maxBytes calculation Also, don't allow minBytes > maxBytes	2025-07-23 16:19:37 -07:00
David Addison	def2d3689c	Minor fix to Makefile Move comments to separate lines	2025-07-23 16:04:30 -07:00
David Addison	97ee098516	Add Turing (SM75) support to CUDA 13.0 builds	2025-06-04 17:54:58 -07:00
David Addison	e7c8825b0b	Wrap ncclCommWindowRegister() calls within ncclGroup	2025-06-03 10:36:53 -07:00
Martin Belanger	dafb70408d	Print the name of the program being executed One thing missing from the stdout of each performance test is the name of the test that is actually being run. This patch adds 2 new messages to the stdout. At the beginning of the execution of a test (e.g. sendrecv_perf) we will now see this message: Collective test starting: sendrecv_perf And at the end, we will now see this: Collective test concluded: sendrecv_perf This is needed when running several tests consecutively and we're trying to parse the stdout to collect the results. For example, using a Python script to parse the stdout, one could retrieve the results for each test and plot them on a graph. This patch makes it easier to implement such a script. Signed-off-by: Martin Belanger <martin.belanger@dell.com>	2025-06-03 11:43:02 -04:00
David Addison	5290298ab6	Reinstate Pascal suppport for CUDA 12.8+ builds	2025-06-02 09:29:52 -07:00
David Addison	8bc16f4e01	Need to drop Volta (sm_70) support from CUDA 13.0	2025-05-30 18:04:25 -07:00
David Addison	0c60e6a8e4	Fix formatting errors in README.md	2025-05-30 17:43:30 -07:00

1 2 3

141 Commits