开发者

gcc argument register spilling on x86-64

开发者 https://www.devze.com 2023-03-31 11:47 出处:网络
I\'m doing some experimenting with x86-64 assembly. Having compiled this dummy function: long myfunc(long a, long b, long c, long d,

I'm doing some experimenting with x86-64 assembly. Having compiled this dummy function:

long myfunc(long a, long b, long c, long d,
            long e, long f, long g, long h)
{
    long xx = a * b * c * d * e * f * g * h;
    long yy = a + b + c + d + e + f + g + h;
    long zz = utilfunc(xx, yy, xx % yy);
    return zz + 20;
}

With gcc -O0 -g I was surprised to find the following in the beginning o开发者_JAVA技巧f the function's assembly:

0000000000400520 <myfunc>:
  400520:       55                      push   rbp
  400521:       48 89 e5                mov    rbp,rsp
  400524:       48 83 ec 50             sub    rsp,0x50
  400528:       48 89 7d d8             mov    QWORD PTR [rbp-0x28],rdi
  40052c:       48 89 75 d0             mov    QWORD PTR [rbp-0x30],rsi
  400530:       48 89 55 c8             mov    QWORD PTR [rbp-0x38],rdx
  400534:       48 89 4d c0             mov    QWORD PTR [rbp-0x40],rcx
  400538:       4c 89 45 b8             mov    QWORD PTR [rbp-0x48],r8
  40053c:       4c 89 4d b0             mov    QWORD PTR [rbp-0x50],r9
  400540:       48 8b 45 d8             mov    rax,QWORD PTR [rbp-0x28]
  400544:       48 0f af 45 d0          imul   rax,QWORD PTR [rbp-0x30]
  400549:       48 0f af 45 c8          imul   rax,QWORD PTR [rbp-0x38]
  40054e:       48 0f af 45 c0          imul   rax,QWORD PTR [rbp-0x40]
  400553:       48 0f af 45 b8          imul   rax,QWORD PTR [rbp-0x48]
  400558:       48 0f af 45 b0          imul   rax,QWORD PTR [rbp-0x50]
  40055d:       48 0f af 45 10          imul   rax,QWORD PTR [rbp+0x10]
  400562:       48 0f af 45 18          imul   rax,QWORD PTR [rbp+0x18]

gcc very strangely spills all argument registers onto the stack and then takes them from memory for further operations.

This only happens on -O0 (with -O1 there are no problems), but still, why? This looks like an anti-optimization to me - why would gcc do that?


I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c that don't exist anymore).

I'm looking at the source code of gcc-4.5-20110825.

In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start in gcc/function.c. There we find the following for handling parameters:

4462   /* Initialize rtx for parameters and local variables.
4463      In some cases this requires emitting insns.  */
4464   assign_parms (subr);

In assign_parms the code that handles where each arguments is stored is the following:

3207       if (assign_parm_setup_block_p (&data))
3208         assign_parm_setup_block (&all, parm, &data);
3209       else if (data.passed_pointer || use_register_for_decl (parm))
3210         assign_parm_setup_reg (&all, parm, &data);
3211       else
3212         assign_parm_setup_stack (&all, parm, &data);

assign_parm_setup_block_p handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl.

Here the relevant part is:

1972   if (optimize)
1973     return true;
1974 
1975   if (!DECL_REGISTER (decl))
1976     return false;

DECL_REGISTER tests whether the variable was declared with the register keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.

Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:

1 extern int bar(int);
2 int foo(int a) {
3         int b = bar(a | 1);
4         b += 42;
5         return b;
6 }

Compiled with gcc -O1 -c this generates the following on my machine:

 0: 48 83 ec 08             sub    $0x8,%rsp
 4: 83 cf 01                or     $0x1,%edi
 7: e8 00 00 00 00          callq  c <foo+0xc>
 c: 83 c0 2a                add    $0x2a,%eax
 f: 48 83 c4 08             add    $0x8,%rsp
13: c3                      retq   

Which is fine except if you break on line 5 and try to print the value of a, you get

(gdb) print a
$1 = <value optimized out>

As the argument gets overwritten since it's not used after the call to bar.


A couple of reasons:

  1. In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
  2. Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.

When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.

0

精彩评论

暂无评论...
验证码 换一张
取 消