开发者

Atomic Add with long int is not working

开发者 https://www.devze.com 2023-03-08 12:53 出处:网络
As cuda programming guide suggests, I want to call Atom开发者_Python百科icAdd function: unsigned long long int atomicAdd(unsigned long long int* address,

As cuda programming guide suggests, I want to call Atom开发者_Python百科icAdd function:

unsigned long long int atomicAdd(unsigned long long int* address,
                             unsigned long long int val);

But, when a call this with two variable:

unsigned long long int *c and unsigned long long int sum

I got this error:

 dotproduct_kernel.cu(23): error: no instance of overloaded function "atomicAdd" matches the argument list
        argument types are: (unsigned long long *, unsigned long long)

I didn't know that long long int really exist, so I tried long int long but everything fails.

I need a big Data Type because my result is gonna be something close to 10^14.

All information about my device. I guess the compute capability is 1.2, right?

Major revision number:         1
Minor revision number:         2
Name:                          GeForce GT 240
Total global memory:           1073020928
Total shared memory per block: 16384
Total registers per block:     16384
Warp size:                     32
Maximum memory pitch:          2147483647
Maximum threads per block:     512
Maximum dimension 0 of block:  512
Maximum dimension 1 of block:  512
Maximum dimension 2 of block:  64
Maximum dimension 0 of grid:   65535
Maximum dimension 1 of grid:   65535
Maximum dimension 2 of grid:   1
Clock rate:                    1340000
Total constant memory:         65536
Texture alignment:             256
Concurrent copy and execution: Yes
Number of multiprocessors:     12
Kernel execution timeout:      Yes

This is the complete code:

__global__ void dot (long int *a, long int *b, long int *c){
    __shared__ long int temp[THREADS_PER_BLOCK];
    c[0] = 0;
    long index = (blockIdx.x * blockDim.x) + threadIdx.x;
    temp[threadIdx.x] = a[index] * b[index];

    __syncthreads();

    if( 0 == threadIdx.x ){
        long int sum = 0;
        int i;
        for( i = 0; i<THREADS_PER_BLOCK; i++) {
            sum += temp[i];
        }
        atomicAdd(c, sum); //remember of -arch=sm_11
    }
}


be sure you compile your code with -arch=sm_11 or above (by default it's compiled for compute camability 1.0). Also be aware if you are using the common.mk file include in the SDK as it could override some of your flag.

I'm sorry, but i was almost sure that minimum requirements for atomicAdd was 1.1 but it seems to be 1.2 (which your gpu supports). I've also compiled your kernel using 'unsigned long long' ('long int' is not a valid data type for atomicAdd). See B.11.1.1 atomicAdd(). NVIDIA CUDA C Programming Guide, v3.2.

Atomic functions operating on shared memory and atomic functions operating on 64-bit words are only available for devices of compute capability 1.2 and above.

Hope this help.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号