开发者

Passing CUDA Random Generator State by reference

开发者 https://www.devze.com 2023-01-24 00:57 出处:网络
Is the following code correct when passing the random generator state(CUDA 开发者_开发百科toolkit 3.2 curand.lib) by reference in function CalculateValue(curandState *localStat) and GetExponential(cur

Is the following code correct when passing the random generator state(CUDA 开发者_开发百科toolkit 3.2 curand.lib) by reference in function CalculateValue(curandState *localStat) and GetExponential(curandState *localState)?

Thanks

__device__ double GetExponential(curandState *localState) { 
    double u1 = curand_uniform_double(localState); } 


__device__  double CalculateValue(curandState *localStat)  { 
  double x = GetExponential(localState);  
  return x; } 


__global__ void RunMonteCarloKernel(curandState *state, double *results) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Copy state to local memory for efficiency */ 
    curandState localState = state[threadIdx.x + blockIdx.x * blockDim.x];    

    results[i] = CalculateValue(&localState); 

    /* Copy state back to global memory */ 
    state[threadIdx.x + blockIdx.x * blockDim.x] = localState; }

__global__ void setup_kernel(curandState *state) { 
    int i = threadIdx.x + blockIdx.x * blockDim.x; 

    /* Each thread gets different seed, a different sequence number, no offset */ 
    curand_init(i, i, 0, &state[i]); } 

int main(void) { 
    double *devResults; 
    curandState *devStates; 

    /* Allocate space for prng states on device */ 
    CUDA_CALL(cudaMalloc((void **)&devStates, totalThreads * sizeof(curandState))); 

    /* Setup prng states */ 
   setup_kernel<<<totalBlocks, threadsPerBlock>>>(devStates); 

    for(int i=0; i< 1000; i++) 
    { 
            RunMonteCarloKernel(devStates, devResults); 
    } }


Is there a problem? It looks ok.

You may want to check out the EstimatePiInlineP sample which is in the MonteCarloCURAND directory of the 3.2 SDK. It uses C++ style pass by reference to avoid taking the address of a local variable. You would need to store the state back to memory at the end of the kernel (as you do in your code).

Passing by C++ reference can assist the compiler by clearly showing that the function can operate on the data directly in the original registers. Taking the address of a local array in a GPU can be detrimental to performance if the compiler cannot be certain that all threads handle the pointer identically (i.e. identical operations on the pointer), in which case it will spill the array to local memory. It'll work, but it may be slower.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号