Added Device API infrastructure and example kernels
Two new command line arguments:
-D <num> device kernel implementation to use <0/1/2/3/4>
-V <num> number of CTAs to launch device kernels with
Added new CTA Policy command line option:
-x <policy> set the CTA Policy <0/1/2>