开发者

Do modern C++ compilers inline functions which are called exactly once?

开发者 https://www.devze.com 2023-03-29 06:40 出处:网络
As in, say my header file is: class A { void Complicated(); } And my source file void A::Complicated() { ...really long function...

As in, say my header file is:

class A
{
    void Complicated();
}

And my source file

void A::Complicated()
{
    ...really long function...
}

Can I split the source file into

开发者_如何学运维void DoInitialStuff(pass necessary vars by ref or value)
{
    ...
}
void HandleCaseA(pass necessary vars by ref or value)
{
    ...
}
void HandleCaseB(pass necessary vars by ref or value)
{
    ...
}
void FinishUp(pass necessary vars by ref or value)
{
    ...
}
void A::Complicated()
{
    ...
    DoInitialStuff(...);
    switch ...
        HandleCaseA(...)
        HandleCaseB(...)
    ...
    FinishUp(...)
}

Entirely for readability and without any fear of impact in terms of performance?


You should mark the functions static so that the compiler know they are local to that translation unit.

Without static the compiler cannot assume (barring LTO / WPA) that the function is only called once, so is less likely to inline it.

Demonstration using the LLVM Try Out page.

That said, code for readability first, micro-optimizations (and such tweaking is a micro-optimization) should only come after performance measures.


Example:

#include <cstdio>

static void foo(int i) {
  int m = i % 3;
  printf("%d %d", i, m);
}

int main(int argc, char* argv[]) {
  for (int i = 0; i != argc; ++i) {
    foo(i);
  }
}

Produces with static:

; ModuleID = '/tmp/webcompile/_27689_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@.str = private constant [6 x i8] c"%d %d\00"     ; <[6 x i8]*> [#uses=1]

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %cmp4 = icmp eq i32 %argc, 0                    ; <i1> [#uses=1]
  br i1 %cmp4, label %for.end, label %for.body

for.body:                                         ; preds = %for.body, %entry
  %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
  %rem.i = srem i32 %0, 3                         ; <i32> [#uses=1]
  %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
  %inc = add nsw i32 %0, 1                        ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %inc, %argc             ; <i1> [#uses=1]
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, %entry
  ret i32 0
}

declare i32 @printf(i8* nocapture, ...) nounwind

Without static:

; ModuleID = '/tmp/webcompile/_27859_0.bc'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-unknown-linux-gnu"

@.str = private constant [6 x i8] c"%d %d\00"     ; <[6 x i8]*> [#uses=1]

define void @foo(int)(i32 %i) nounwind {
entry:
  %rem = srem i32 %i, 3                           ; <i32> [#uses=1]
  %call = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %i, i32 %rem) ; <i32> [#uses=0]
  ret void
}

declare i32 @printf(i8* nocapture, ...) nounwind

define i32 @main(i32 %argc, i8** nocapture %argv) nounwind {
entry:
  %cmp4 = icmp eq i32 %argc, 0                    ; <i1> [#uses=1]
  br i1 %cmp4, label %for.end, label %for.body

for.body:                                         ; preds = %for.body, %entry
  %0 = phi i32 [ %inc, %for.body ], [ 0, %entry ] ; <i32> [#uses=3]
  %rem.i = srem i32 %0, 3                         ; <i32> [#uses=1]
  %call.i = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([6 x i8]* @.str, i64 0, i64 0), i32 %0, i32 %rem.i) nounwind ; <i32> [#uses=0]
  %inc = add nsw i32 %0, 1                        ; <i32> [#uses=2]
  %exitcond = icmp eq i32 %inc, %argc             ; <i1> [#uses=1]
  br i1 %exitcond, label %for.end, label %for.body

for.end:                                          ; preds = %for.body, %entry
  ret i32 0
}


Depends on aliasing (pointers to that function) and function length (a large function inlined in a branch could throw the other branch out of cache, thus hurting performance).

Let the compiler worry about that, you worry about your code :)


A complicated function is likely to have its speed dominated by the operations within the function; the overhead of a function call won't be noticeable even if it isn't inlined.

You don't have much control over the inlining of a function, the best way to know is to try it and find out.

A compiler's optimizer might be more effective with shorter pieces of code, so you might find it getting faster even if it's not inlined.


If you split up your code into logical groupings the compiler will do what it deems best: If it's short and easy, the compiler should inline it and the result is the same. If however the code is complicated, making an extra function call might actually be faster than doing all the work inlined, so you leave the compiler the option to do that too. On top of all that, the logically split code can be far easier for a maintainer to grok and avoid future bugs.


I suggest you create a helper class to break your complicated function into method calls, much like you were proposing, but without the long, boring and unreadable task of passing arguments to each and every one of these smaller functions. Pass these arguments only once by making them member variables of the helper class.

Don't focus on optimization at this point, make sure your code is readable and you'll be fine 99% of the time.

0

精彩评论

暂无评论...
验证码 换一张
取 消