* Fallback to NCCL for various patterns when input size is large.
Move the previous implementation to cpp side.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
* Revising.
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>
---------
Signed-off-by: Yukun He <23156053+hyukn@users.noreply.github.com>