C开发者_如何转开发an I create threads in _device _ like:
__device__ float func()
{
int idx = threadIdx.x + blockIdx.x * blockDim.x;
// do stuff
return some_float;
}
Or can you only make threads in _global _ kernels?
int idx = threadIdx.x + blockIdx.x * blockDim.x;
Short answer: The above line of code is perfectly valid in __device__
functions.
However it does not "create" threads. It simply computes an index in idx
for the current thread, using that thread's values of threadIdx.x
, blockIdx.x
and blockDim.x
.
The only way to create threads in CUDA is to launch a kernel using the <<<>>>
syntax to specify the number and grouping of threads using block and grid dimensions:
int blockSize = 128;
int gridSize = (N + blockSize - 1) / blockSize;
myKernel<<<gridSize, blockSize>>>();
精彩评论