开发者

Inline speed and compiler optimization

开发者 https://www.devze.com 2023-01-09 06:04 出处:网络
I\'m doing a bit of hands on research surrounding the speed benefits of making a function inline. I don\'t have the book with me, but one text I was reading, was suggesting a fairly large overhead cos

I'm doing a bit of hands on research surrounding the speed benefits of making a function inline. I don't have the book with me, but one text I was reading, was suggesting a fairly large overhead cost to making function calls; and when ever executable size is either negligible, or can be spared, a function should be declared inline, for speed.

I've written the following code to test this theory, and from what I can tell, there is no speed benifit from declaring a function as inline. Both functions, when called 4294967295 times, on my computer, execute in 196 seconds.

My question is, what would be your thoughts as to why this is happening? Is it modern compiler optimization? Would it be the lack of large calculations taking place in the function?

Any insight on the matter would be appreciated. Thanks in advance friends.

#include < iostream >
#include < time.h >

// RESEARCH                                                   Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
// Two functions that preform an identacle arbitrary floating point calculation
// one function is inline, the other is not.

double test(double a, double b, double c);
double inlineTest(double a, double 开发者_运维百科b, double c);

double test(double a, double b, double c){
    a = (3.1415 / 1.2345) / 4 + 5;
    b = 9.999 / a + (a * a);
    c = a *=b;
    return c;
}

inline
double inlineTest(double a, double b, double c){
    a = (3.1415 / 1.2345) / 4 + 5;
    b = 9.999 / a + (a * a);
    c = a *=b;
    return c;
}

// ENTRY POINT                                                Jared Thomson 2010
////////////////////////////////////////////////////////////////////////////////
int main(){
    const unsigned int maxUINT = -1;
    clock_t start = clock();

    //============================ NON-INLINE TEST ===============================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

    clock_t end = clock();
    std::cout << maxUINT << " calls to non inline function took " 
              << (end - start)/CLOCKS_PER_SEC << " seconds.\n";

    start = clock();

    //============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

    end = clock();
    std::cout << maxUINT << " calls to inline function took " 
              << (end - start)/CLOCKS_PER_SEC << " seconds.\n";

    getchar(); // Wait for input.
    return 0;
} // Main.

Assembly Output

PasteBin


The inline keyword is basically useless. It is a suggestion only. The compiler is free to ignore it and refuse to inline such a function, and it is also free to inline a function declared without the inline keyword.

If you are really interested in doing a test of function call overhead, you should check the resultant assembly to ensure that the function really was (or wasn't) inlined. I'm not intimately familiar with VC++, but it may have a compiler-specific method of forcing or prohibiting the inlining of a function (however the standard C++ inline keyword will not be it).

So I suppose the answer to the larger context of your investigation is: don't worry about explicit inlining. Modern compilers know when to inline and when not to, and will generally make better decisions about it than even very experienced programmers. That's why the inline keyword is often entirely ignored. You should not worry about explicitly forcing or prohibiting inlining of a function unless you have a very specific need to do so (as a result of profiling your program's execution and finding that a bottleneck could be solved by forcing an inline that the compiler has for some reason not done).

Re: the assembly:

; 30   :     const unsigned int maxUINT = -1;
; 31   :     clock_t start = clock();

    mov esi, DWORD PTR __imp__clock
    push    edi
    call    esi
    mov edi, eax

; 32   :     
; 33   :     //============================ NON-INLINE TEST ===============================//
; 34   :     for(unsigned int i = 0; i < maxUINT; ++i)
; 35   :         blank(1.1,2.2,3.3);
; 36   :     
; 37   :     clock_t end = clock();

    call    esi

This assembly is:

  1. Reading the clock
  2. Storing the clock value
  3. Reading the clock again

Note what's missing: calling your function a whole bunch of times

The compiler has noticed that you don't do anything with the result of the function and that the function has no side-effects, so it is not being called at all.

You can likely get it to call the function anyway by compiling with optimizations off (in debug mode).


Both the functions could be inlined. The definition of the non-inline function is in the same compilation unit as the usage point, so the compiler is within its rights to inline it even without you asking.

Post the assembly and we can confirm it for you.

EDIT: the MSVC compiler pragma for banning inlining is:

#pragma auto_inline(off)
    void myFunction() { 
        // ...
    }
#pragma auto_inline(on)


Two things could be happening:

  1. The compiler may either be inlining both or neither functions. Check your compiler documentation for how to control that.

  2. Your function may be complex enough that the overhead of doing the function call isn't big enough to make a big difference in the tests.

Inlining is great for very small functions but it's not always better. Code bloat can prevent the CPU from caching code.

In general inline getter/setter functions and other one liners. Then during performance tuning you can try inlining functions if you think you'll get a boost.


Your code as posted contains a couple oddities.

1) The math and output of your test functions are completely independent of the function parameters. If the compiler is smart enough to detect that those functions always return the same value, that might give it incentive to optimize them out entirely inline or not.

2) Your main function is calling test for both the inline and non-inline tests. If this is the actual code that you ran, then that would have a rather large role to play in why you saw the same results.

As others have suggested, you would do well to examine the actual assembly code generated by the compiler to determine that you're actually testing what you intended to.


Um, shouldn't

//============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
        test(1.1,2.2,3.3);

be

//============================ INLINE TEST ===================================//
    for(unsigned int i = 0; i < maxUINT; ++i)
         inlineTest(1.1,2.2,3.3);

?

But if that was just a typo, would recommend that look at a dissassembler or reflector to see if the code is actually inline or still stack-ed.


If this test took 196 seconds for each loop, then you must not have turned optimizations on; with optimizations off, generally compilers don't inline anything.

With optimization on, however, the compiler is free to notice that your test function can be completely evaluated at compile time, and crush it down to "return [constant]" -- at which point, it may well decide to inline both functions since they're so trivial, and then notice that the loops are pointless since the function value is not used, and squash that out too! This is basically what I got when I tried it.

So either way, you're not testing what you thought you tested.


Function call overhead ain't what it used to be, compared to the overhead of blowing out the level-1 instruction cache, which is what aggressive inlining does to you. You can easily find reports online of gcc's -Os option (optimize for size) being a better default choice for large projects than -O2, and the big reason for that is that -O2 inlines a lot more aggressively. I would expect it is much the same with MSVC.


The only way I know of to guarantee a function is inline is to #define it

For example:

#define RADTODEG(x) ((x) * 57.29578)

That said, the only time I would bother with such a function would be in an embedded system. On a desktop/server the performance difference is negligible.


Run it in a debugger and have a look at the generated code to see if your function is always or never inlined. I think it's always a good idea to have a look at the assembler code when you want more knowledge about the optimization the compiler does.


Apologies for a small flame ...

Compilers think in assembly language. You should too. Whatever else you do, just step through the code at the assembler level. Then you'll know exactly what the compiler did.

Don't think of performance in absolute terms like "fast" or "slow". It's all relative, percentage-wise. The way software is made fast is by removing, in successive steps, things that take too large a percent of the time.

Here's the flame: If a compiler can do a pretty good job of inlining functions that clearly need it, and if it can do a really good job of managing registers, I think that's just what it should do. If it can do a reasonable job of unrolling loops that clearly could use it, I can live with that. If it's knocking itself out trying to outsmart me by removing function calls that I clearly wrote and intended to be called, or scrambling my code sanctimoniously trying to save a JMP when that JMP occupies 0.000001% of running time (the way Fortran does), I get annoyed, frankly.

There seems to be a notion in the compiler world that there's no such thing as an unhelpful optimization. No matter how smart the compiler is, real optimization is the programmer's job, and nobody else's.

0

精彩评论

暂无评论...
验证码 换一张
取 消