开发者

I'm using cudaMallocPitch and cudaMemcpy2D in CUDA but I dont get correct answer!

开发者 https://www.devze.com 2023-03-15 09:59 出处:网络
This is my code, initializing a matrix d_ref and copying it to device. I\'m not sure if I\'m using cudaMallocPitch and cudaMemcpy2D correctly but I tried to use cudaMemcpy2D and bottom page 20 of CUDA

This is my code, initializing a matrix d_ref and copying it to device. I'm not sure if I'm using cudaMallocPitch and cudaMemcpy2D correctly but I tried to use cudaMemcpy2D and bottom page 20 of CUDA programming guide

All I get in output is 0.

What's wrong with my code? and is it the best way of doing this job?

Thanks in advance.

__host__    

float *d_ref;

float **h_ref = new float* [width];
for (int i=0;i<width;i++)
    h_ref[i]= new float [height];

for (int i=0;i<width;i++){
    for (int j=0;j<height;j++){
        h_ref[i][j]=ref_list[j][i]; //transpose
    }   
}

size_t ref_pitch;

cudaMallocPitch(&d_ref, &ref_pitch, width * sizeof(float), height);

cudaMemcpy2D(d_ref, ref_pitch, h_ref, width*sizeof(float),width*sizeof(float), height*sizeof(float), cudaMemcpyHostToDevice);


lowerBound<<<grid, block>>>(d_ref, ...




__global__ void lowerBound (float* d_ref, ....


    float* ref = (float*)((char*)d_ref + blockIdx.x 开发者_StackOverflow社区* ref_pitch);

    cuPrintf(" %f \n",ref[threadIdx.x]);


In this line:

cudaMemcpy2D(d_ref, ref_pitch, h_ref, width*sizeof(float),width*sizeof(float), height*sizeof(float), cudaMemcpyHostToDevice);

why are you multiplying height by sizeof(float)? You are transferring wayyy too much data!


The call to cudaMemcpy2D as written assumes that h_ref is a 2D-array of 'width' x 'height' float elements stored contiguously, when in fact it is a 1D-array of 'width' pointers.

Instead of representing the matrix as a 1D array of vectors, I would suggest storing it in a 1D array of 'width' x 'height' floats, and using macros for access based on row, column.

0

精彩评论

暂无评论...
验证码 换一张
取 消