开发者

CUDA: Calling a __device__ function from a kernel

开发者 https://www.devze.com 2023-02-26 11:19 出处:网络
I have a kernel that calls a device function inside an if statement. The code is as follows: __device__ void SetValues(int *ptr,int id)

I have a kernel that calls a device function inside an if statement. The code is as follows:

__device__ void SetValues(int *ptr,int id)
{
    if(ptr[threadIdx.x]==id) //question related to here
          ptr[threadIdx.x]++;
}

__global__ void Kernel(int *ptr)
{
    if(threadIdx.x<2)
         SetValues(ptr,threadIdx开发者_JAVA技巧.x);
}

In the kernel threads 0-1 call SetValues concurrently. What happens after that? I mean there are now 2 concurrent calls to SetValues. Does every function call execute serially? So they behave like 2 kernel function calls?


CUDA actually inlines all functions by default (although Fermi and newer architectures do also support a proper ABI with function pointers and real function calls). So your example code gets compiled to something like this

__global__ void Kernel(int *ptr)
{
    if(threadIdx.x<2)
        if(ptr[threadIdx.x]==threadIdx.x)
            ptr[threadIdx.x]++;
}

Execution happens in parallel, just like normal code. If you engineer a memory race into a function, there is no serialization mechanism that can save you.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号