3D image indices_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-03 11:21 出处：网络

I have an image of size 512 x 512 x 512. I need to process all the voxels individually. How can I get the thread id to do this?

相关专题：

I have an image of size 512 x 512 x 512. I need to process all the voxels individually. How can I get the thread id to do this? If I use 1D thread ID the number of blocks will exceeds 65536.

    int id = blockIdx.x*blockDim.x + threadIdx.x;

Note :- My card doesnt support for 开发者_JAVA百科the 3D grids

You are able to use 3D indicies in CUDA 4.0 and compute capability 2.0+. Example code:

int blocksInX = (nx+8-1)/8;
int blocksInY = (ny+8-1)/8;
int blocksInZ = (nz+8-1)/8;

dim3 Dg(blocksInX, blocksInY, blocksInZ);
dim3 Db(8, 8, 8);
foo_kernel<<Dg, Db>>(R, nx, ny, nz);

...

__global__ void foo_kernel( float* R, const int nx, const int ny, const int nz )
{
  unsigned int xIndex = blockDim.x * blockIdx.x + threadIdx.x;
  unsigned int yIndex = blockDim.y * blockIdx.y + threadIdx.y;
  unsigned int zIndex = blockDim.z * blockIdx.z + threadIdx.z;

  if ( (xIndex < nx) && (yIndex < ny) && (zIndex < nz) )
  {
    unsigned int index_out = xIndex + nx*yIndex + nx*ny*zIndex;
    ...
    R[index_out] = ...;
  }
}

If your device doesn't support compute capability 2.0, there is some trick:

int threadsInX = 16;
int threadsInY = 4;
int threadsInZ = 4;

int blocksInX = (nx+threadsInX-1)/threadsInX;
int blocksInY = (ny+threadsInY-1)/threadsInY;
int blocksInZ = (nz+threadsInZ-1)/threadsInZ;

dim3 Dg = dim3(blocksInX, blocksInY*blocksInZ);
dim3 Db = dim3(threadsInX, threadsInY, threadsInZ);

foo_kernel<<<Dg, Db>>>(R, nx, ny, nz, blocksInY, 1.0f/(float)blocksInY);

__global__ void foo_kernel(float *R, const int nx, const int ny, const int nz,
                           unsigned int blocksInY, float invBlocksInY)
{

    unsigned int blockIdxz = __float2uint_rd(blockIdx.y * invBlocksInY);
    unsigned int blockIdxy = blockIdx.y - __umul24(blockIdxz, blocksInY);
    unsigned int xIndex = __umul24(blockIdx.x, blockDim.x) + threadIdx.x;
    unsigned int yIndex = __umul24(blockIdxy, blockDim.y) + threadIdx.y;
    unsigned int zIndex = __umul24(blockIdxz, blockDim.z) + threadIdx.z;

    if ( (xIndex < nx) && (yIndex < xIndex) && (zIndex < nz) )
    {
        unsigned int index = xIndex + nx*yIndex + nx*ny*zIndex;
        ...
        R[index] = ...;
    }

}

You could use grids. It gives you much more indexes.

Note that the memory of your PC is not in 3D. It's just the matter of visualization, so you can convert your 3D image into a single pointer.

Array[i][j][z] is same as Array2[ i*cols+j + rows*cols*z];

Now feed the Array2 to CUDA and work in single dimension

If you need a larger grid, CUDA supports 2D grids on all hardware, and the most recent versions of the CUDA toolkit also support 3D grids on current Fermi hardware.

However, it isn't strictly necessary to have such large grids. If each voxel operation is independent, then why not just use a 1D grid, but have each thread process more than one voxel? Not only would such a scheme not need larger 2D or 3D grids, it might well be more efficient because the fixed costs associated with scheduling and initialization of a block can be amortized over multiple voxel calculations.

I used something like this:

In the code define your grid: dim3 altgrid,altthreads; altgrid.x=lx; altgrid.y=ly; altgrid.z=1; altthreads.x=lz; altthreads.y=1; altthreads.z=1;

and in the kernel

int idx = threadIdx.x;
int idy = blockIdx.x ;
int idz = blockIdx.y ;

Since the array in on device is only 1D you retrieve the [idx][idy][idz] element by of a matrix A as A[ind], where ind=idz+lz*(idy+ly*idx);

I hope it helps

3D image indices

精彩评论

关注公众号

热门标签

图文推荐

3D image indices

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：