mirror of
https://github.com/NVIDIA/nccl-tests.git
synced 2026-01-14 02:47:21 +08:00
* Added "verifiable", a suite of kernels for generating and verifying reduction input and output arrays in a bit-precise way. * Data corruption errors now reported in number of wrong elements instead of max deviation. * Use ncclGetLastError. * Don't run hypercube on non-powers of 2 ranks. * Fix to hypercube data verification. * Use "thread local" as the defaut CUDA capture mode. * Replaced pthread_yield -> sched_yield() * Bugfix to the cpu-side barrier/allreduce implementations. |
||
|---|---|---|
| .. | ||
| inexact_regress.cu | ||
| Makefile | ||
| verifiable.cu | ||
| verifiable.h | ||
| verifiable.mk | ||