开发者

Is there a way of setting default value for shared memory array?

开发者 https://www.devze.com 2023-03-15 14:33 出处:网络
Consider the following code: __global__ void kernel(int *something) { extern __shared__ int shared_array[];

Consider the following code:

__global__ void kernel(int *something) {
extern __shared__ int shared_array[];     

// Some operations on shared_array here.

}

Is it possible to set whole shared_array to some v开发者_运维问答alue - e.g. 0 - without explicitly addressing each cell in some thread?


You can efficiently initialize shared arrays in parallel like this

// if SHARED_SIZE == blockDim.x, eliminate this loop
for (int i = threadIdx.x; i < SHARED_SIZE; i += blockDim.x) 
    shared_array[i] = INITIAL_VALUE;
__syncthreads();


No. Shared memory is uninitialised. You have to somehow initialise it yourself, one way or another...

From CUDA C Programming Guide 3.2, Section B.2.4.2, paragraph 2:

__shared__ variables cannot have an initialization as part of their declaration.

This also discards nontrivial default constructors for shared variables.


Yes, you can. You can specify that the first thread in the block sets it, while the other's don't eg.:

extern __shared__ unsigned int local_bin[]; // Size specified in kernel call

if (threadIdx.x == 0) // Wipe on first thread - include " && threadIdx.y == 0" and " && threadIdx.z == 0"  if threadblock has 2 or 3 dimensions instead of 1.
{
    // For-loop to set all local_bin array indexes to specified value here - note you cannot use cudaMemset as it translates to a kernel call itself
}

// Do stuff unrelated to local_bin here    

__syncthreads(); // To make sure the memset above has completed before other threads start writing values to local_bin.

// Do stuff to local_bin here

Ideally you should do as much work as possible before the syncthreads call, as this allows for all the other threads to do their work before the memset is complete - obviously this only matters if the work has the potential to have quite different thread completion times, for example if there is conditional branching. Note that for the thread 0 "setting" for-loop, you need to have passed the size of the local_bin array as a parameter to the kernel so you know the size of the array you are iterating.

Original concept source

0

精彩评论

暂无评论...
验证码 换一张
取 消