How good is NVCC at code optimizations?_问答_开发者

开发者 https://www.devze.com 2023-04-06 18:22 出处：网络

How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?

相关专题：gpgpu

How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?

E.g, will it reduce the following:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_开发者_开发百科PI);

to this:

float sqrt_2pi = sqrtf(2 * M_PI); // Compile time constant
float a = 1 / sqrt_2pi;
float b = c / sqrt_2pi;

What about more clever optimizations, involving knowing semantics of math functions:

float a = 1 / sqrtf(c * M_PI);
float b = c / sqrtf(M_PI);

to this:

float sqrt_pi = sqrtf(M_PI); // Compile time constant
float a = 1 / (sqrt_pi * sqrtf(c));
float b = c / sqrt_pi;

The compiler is way ahead of you. In your example:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);

nvopencc (Open64) will emit this:

    mov.f32         %f2, 0f40206c99;        // 2.50663
    div.full.f32    %f3, %f1, %f2;
    mov.f32         %f4, 0f3ecc422a;        // 0.398942

which is equivalent to

float b = c / 2.50663f;
float a = 0.398942f;

The second case gets compiled to this:

float a = 1 / sqrtf(c * 3.14159f); // 0f40490fdb
float b = c / 1.77245f; // 0f3fe2dfc5

I am guessing the expression for a generated by the compiler should be more accurate than your "optmized" version, but about the same speed.

How good is NVCC at code optimizations?