My CUDA Kernel doesn't seem to be changing the values of the arrays I pass in, here's the relevant host code:
dim3 grid(numNets, N);
dim3 threads(1, 1, 1);
// allocate the arrays and jagged arrays on the device
alloc_dev_memory( state0, state1, d_state0, d_state1,
adjlist, d_adjlist, transfer, d_transfer,
indeg, d_indeg, d_N, d_K, d_S,
d_Spow, d_numNets );
// operate on the device memory
kernel<<< grid, threads >>>( d_state0, d_state1, d_adjlist, d_transfer, d_indeg,
d_N, d_K, d_S, d_Spow, d_numNets );
// copy the new states from the device to the host
cutilSafeCall( cudaMemcpy( state0, d_state0, ens_size*sizeof(int),
cudaMemcpyDeviceToHost ) );
// copy the new states from the array to the ensemble
for(int i=0; i < numNets; ++i)
开发者_Go百科 nets[i]->set_state( state0 + N*i );
Here is the kernel code that is called:
// this dummy kernel just sets all the values to 0 for checking later.
__global__ void kernel( int * state0,
int * state1,
int ** adjlist,
luint ** transfer,
int * indeg,
int * d_N,
float * d_K,
int * d_S,
luint * d_Spow,
int * d_numNets )
{
int N = *d_N;
luint * Spow = d_Spow;
int tid = blockIdx.x*N + blockIdx.y;
state0[tid] = 0;
state1[tid] = 0;
for(int k=0; k < indeg[tid]; ++k) {
adjlist[tid][k] = 0;
}
for(int k=0; k < Spow[indeg[tid]]; ++k) {
transfer[tid][k] = 0;
}
}
Then, after using cudaMemcpy to get the state0 array back on the host, if I loop through state0 and send all the values to stdout, they are the same as the initial values, even though my kernel is written to set all values to zero.
The expected output should be the initial value of state0: 101111101011, followed by the final value of state0: (all zeros)
A sample run of this code outputs:
101111101011
101111101011
Press ENTER to exit...
The second line should be all zeros. Why isn't this CUDA kernel affecting the state0 array?
I found that the values of N
and numNets
were garbage values. The offset by N
was wrong, so the values were being set outside of the array. @pQB, your suggestion was just what I needed.
精彩评论