cuda - kernel optimization_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-03-26 10:49 出处：网络

I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?

相关专题：gpu nvidia

I created a simple particle system. I have a device with compute capability 2.1. What could I change to optimize the kernel?

I assume that variables tPos and tVel are stored in the registers.

__global__ void particles_kernel(float4 *vbo, float4 *pos, float4 *vel)
{
     int tid = blockIdx.x * blockDim.x + threadIdx.x;

     float4 tPos = pos[tid];
     float4 tVel = vel[tid];

     tPos.x += tVel.x;
     tPos.y += tVel.y;
     tPos.z += tVel.z;

     if(tPos.x < -2.0f)
     {
         tVel.x = -tVel.x;
     }
     else if(tPos.x > 2.0f)
     {
         tVel.x = -tVel.x;
     }


     if(tPos.y < -2.0f)
     {
         tVel.y = -tVel.y;
     }
     else if(tPos.y > 2.0f)
     {
         tVel.y = -tVel.y;
     }


     if(tPos.z < -2.0f)
 开发者_运维百科    {
         tVel.z = -tVel.z;
     }
     else if(tPos.z > 2.0f)
     {
         tVel.z = -tVel.z;
     }


     pos[tid] = tPos;
     vel[tid] = tVel;


     vbo[tid] = make_float4(tPos.x, tPos.y, tPos.z, tPos.w);
}

Unless I am missing something, your clamping code can be simplified like this:

if (fabsf(tVel.x) > 2.0f) tVel.x = -tVel.x;
if (fabsf(tVel.y) > 2.0f) tVel.y = -tVel.y;
if (fabsf(tVel.z) > 2.0f) tVel.z = -tVel.z;

However given the relatively small amont of computation, this change will probably not improve performance as the code appears to be memory bound (you are streaming through the data). Maybe there is additional computation elsewhere in your app that you could combine with this computation to increase the computational density?