So, if I have a device (or global) function that creates/copies some 开发者_高级运维data into shared memory and I later call another device function, like so:
__global__ void a(){
__shared__ int blah=0;
fun();
}
__device__ void fun(){
blah = 1; //perform some operations
//do whatever
}
I'm a bit rusty with my CUDA, I think you might have had to "redefine" shared variable (I assume the operation checked if a shared variable of that name exists, if so assigns it) - this had the effect of creating context - so basically the variable didn't just come out of nowhere. Alternatively, if it's similar to having a global variable in standard C/C++ and I can just reference it, like I did above, it'd be great.
I am familiar with memory hierarchy, I'm just rusty on the semantics of creating/referencing memory.
Please advise on whether the above sketch would work. Thanks.
No that won't work in CUDA, any more that it would work in standard C99. Currently, the preferred method of __device__
function compilation is inline expansion (they are also compiled as standalone code objects for the Fermi architecture), but even so __device__
functions still must obey standard syntax and scope conventions of C99. So you need to pass arguments which don't have compilation unit scope by reference to __device__
functions.
精彩评论