开发者

Predefinition of often used values in computations - does it change anything?

开发者 https://www.devze.com 2023-02-12 16:04 出处:网络
I\'m auto generating C code to compute large expressions and try to figure out with simple examples whether it makes sense to predefine certain subparts in separate variables.

I'm auto generating C code to compute large expressions and try to figure out with simple examples whether it makes sense to predefine certain subparts in separate variables.

As a simple example, say we compute something of the form:

#include <cmath>
double test(double x, double y) {
    const double c[9][9] = { ... }; // constants properly initialized, irrelevant
    double expr =   c[0][0]*x*y 
                  + c[1][0]*pow(x,2)*y        + ... + c[8][0]*pow(x,9)*y 
                  + c[1][1]*pow(x,2)*pow(y,2) + ... + c[8][1]*pow(x,9)*pow(y,2)
                  + ...

with all c[i][j] properly initialized. In reality those expressions contain tens of millions of multiplications and additions.

A colleague now proposed -- to reduce the number of calls to pow() and to cache often needed values in the expressions -- to define every power of x and y in a separate variable, which is no big deal as the code is auto generate开发者_如何学Cd anyway, like this:

double xp2 = pow(x,2);
double xp3 = pow(x,3);
double xp4 = pow(x,4);
// ...
// same for pow(y,n)

I think, however, that this is unnecessary, as the compiler should take care of these optimizations.

Unfortunately, I have no experience with reading and interpreting assembly but I think I see that all the calls to pow() are optimized out, is this right? Also, does the compiler cache the values for pow(x,2), pow(x,3), etc?

Thanks in advance for your input!


Using pow with integer arguments... ouch ! Typical implementations of pow are tuned for the general case of floating point arguments, which is why it is usually way slower to write

pow(x, 2) ( = exp(2 * log(x)) )

than

x * x

What I state here is very compiler dependant though. On one hand, some compilers may not even know that pow(x, 2) will yield the same value for a given x (after all, the extern function pow could have side effects), so you don't have any guarantee that common subexpressions will be eliminated. The pow function, on some (many ?) platforms/toolchains, is provided by a library the compiler has no control onto.

On other implementations though, the compiler may turn those pow calls into multiplications, or at least into intrinsics, which may in turn specialize for integer exponents. Your mileage will vary.

The first thing I'd do is to replace calls to pow by multiplications. For larger exponents, you may also do, eg.

double x2 = x * x;
double x3 = x * x2;
double x4 = x2 * x2;

Note that (credits to @Stephen Canon) doing repeated multiplications (with the above quick exponentiation scheme) will introduce roundoff error whose magnitude is proportional to the number of multiplications (ie. O(log exponent)). This error is typically tolerable, but pow guarantees exactness within one unit of least precision.


The compiler may perform common subexpression elimination- remember that it can't guarantee that all functions are re-entrant, but if pow is inlined, then it may well do this.


A good way to compute polynomials is Horner's rule. (eg here) which doesn't require pow() or any extra memory. Your expression is x*y times a polynomial in y each of whose coefficients is a polynomial in x.

Each of these coefficients can be calculated using Horner with 8 multiplies and additions, and the polynomial in y with 8 more multiplies and additions for a total of 74 multiplies and 72 additions , whereas your sample code looks to me like more that 200 multiplications and more than a hundred calls to pow().


pow may be optimized away depending on the toolchain. The only way you can tell is to try it and see.

In the general case, unless the implementation of pow is visible to the compiler as a macro or inline, then the compiler can't cache the result as it doesn't know what side-effects the function may have.


Profile, find out where the bottlenecks are.

If the sub-expressions are used frequently, it may make sense to cache or store the intermediate values. However, accessing these values may take more time than letting the values sit in a data pipeline within the processor. Data fetches outside of the processor are much slower than fetching from its internal data cache.

Also try using Algebra to simplify the mathematical expressions. Perhaps even Linear Algebra to find some more efficient matrix expressions.

You may want to isolate the calculations to expressions involving one variable. Compilers can optimize code better when only one variable is used or changing at a time. For example, substitute the y variable with expressions involving x, if possible. This would reduce to an expression only involving x.

Also search the web for "data driven design" or "data oriented design". These sites show how to optimize code for data centric applications.

0

精彩评论

暂无评论...
验证码 换一张
取 消