开发者

How good is NVCC at code optimizations?

开发者 https://www.devze.com 2023-04-06 18:22 出处:网络
How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?

How well does NVCC optimize device code? Does it do any sort of optimizations like constant folding and common subexpression elimination?

E.g, will it reduce the following:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_开发者_开发百科PI);

to this:

float sqrt_2pi = sqrtf(2 * M_PI); // Compile time constant
float a = 1 / sqrt_2pi;
float b = c / sqrt_2pi;

What about more clever optimizations, involving knowing semantics of math functions:

float a = 1 / sqrtf(c * M_PI);
float b = c / sqrtf(M_PI);

to this:

float sqrt_pi = sqrtf(M_PI); // Compile time constant
float a = 1 / (sqrt_pi * sqrtf(c));
float b = c / sqrt_pi;


The compiler is way ahead of you. In your example:

float a = 1 / sqrtf(2 * M_PI);
float b = c / sqrtf(2 * M_PI);

nvopencc (Open64) will emit this:

    mov.f32         %f2, 0f40206c99;        // 2.50663
    div.full.f32    %f3, %f1, %f2;
    mov.f32         %f4, 0f3ecc422a;        // 0.398942

which is equivalent to

float b = c / 2.50663f;
float a = 0.398942f;

The second case gets compiled to this:

float a = 1 / sqrtf(c * 3.14159f); // 0f40490fdb
float b = c / 1.77245f; // 0f3fe2dfc5

I am guessing the expression for a generated by the compiler should be more accurate than your "optmized" version, but about the same speed.

0

精彩评论

暂无评论...
验证码 换一张
取 消