开发者

cuda - kernel optimization

开发者 https://www.devze.com 2023-03-26 10:49 出处:网络
I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?

I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?

I assume that variables tPos and tVel are stored in the registers.

__global__ void particles_kernel(float4 *vbo, float4 *pos, float4 *vel)
{
     int tid = blockIdx.x * blockDim.x + threadIdx.x;

     float4 tPos = pos[tid];
     float4 tVel = vel[tid];

     tPos.x += tVel.x;
     tPos.y += tVel.y;
     tPos.z += tVel.z;

     if(tPos.x < -2.0f)
     {
         tVel.x = -tVel.x;
     }
     else if(tPos.x > 2.0f)
     {
         tVel.x = -tVel.x;
     }


     if(tPos.y < -2.0f)
     {
         tVel.y = -tVel.y;
     }
     else if(tPos.y > 2.0f)
     {
         tVel.y = -tVel.y;
     }


     if(tPos.z < -2.0f)
 开发者_运维百科    {
         tVel.z = -tVel.z;
     }
     else if(tPos.z > 2.0f)
     {
         tVel.z = -tVel.z;
     }


     pos[tid] = tPos;
     vel[tid] = tVel;


     vbo[tid] = make_float4(tPos.x, tPos.y, tPos.z, tPos.w);
}


Unless I am missing something, your clamping code can be simplified like this:

if (fabsf(tVel.x) > 2.0f) tVel.x = -tVel.x;
if (fabsf(tVel.y) > 2.0f) tVel.y = -tVel.y;
if (fabsf(tVel.z) > 2.0f) tVel.z = -tVel.z;

However given the relatively small amont of computation, this change will probably not improve performance as the code appears to be memory bound (you are streaming through the data). Maybe there is additional computation elsewhere in your app that you could combine with this computation to increase the computational density?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号