I'm doing some experimenting with x86-64 assembly. Having compiled this dummy function:
long myfunc(long a, long b, long c, long d,
long e, long f, long g, long h)
{
long xx = a * b * c * d * e * f * g * h;
long yy = a + b + c + d + e + f + g + h;
long zz = utilfunc(xx, yy, xx % yy);
return zz + 20;
}
With gcc -O0 -g
I was surprised to find the following in the beginning o开发者_JAVA技巧f the function's assembly:
0000000000400520 <myfunc>:
400520: 55 push rbp
400521: 48 89 e5 mov rbp,rsp
400524: 48 83 ec 50 sub rsp,0x50
400528: 48 89 7d d8 mov QWORD PTR [rbp-0x28],rdi
40052c: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi
400530: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx
400534: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx
400538: 4c 89 45 b8 mov QWORD PTR [rbp-0x48],r8
40053c: 4c 89 4d b0 mov QWORD PTR [rbp-0x50],r9
400540: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28]
400544: 48 0f af 45 d0 imul rax,QWORD PTR [rbp-0x30]
400549: 48 0f af 45 c8 imul rax,QWORD PTR [rbp-0x38]
40054e: 48 0f af 45 c0 imul rax,QWORD PTR [rbp-0x40]
400553: 48 0f af 45 b8 imul rax,QWORD PTR [rbp-0x48]
400558: 48 0f af 45 b0 imul rax,QWORD PTR [rbp-0x50]
40055d: 48 0f af 45 10 imul rax,QWORD PTR [rbp+0x10]
400562: 48 0f af 45 18 imul rax,QWORD PTR [rbp+0x18]
gcc
very strangely spills all argument registers onto the stack and then takes them from memory for further operations.
This only happens on -O0
(with -O1
there are no problems), but still, why? This looks like an anti-optimization to me - why would gcc
do that?
I am by no means a GCC internals expert, but I'll give it a shot. Unfortunately most of the information on GCCs register allocation and spilling seems to be out of date (referencing files like local-alloc.c
that don't exist anymore).
I'm looking at the source code of gcc-4.5-20110825
.
In GNU C Compiler Internals it is mentioned that the initial function code is generated by expand_function_start
in gcc/function.c
. There we find the following for handling parameters:
4462 /* Initialize rtx for parameters and local variables.
4463 In some cases this requires emitting insns. */
4464 assign_parms (subr);
In assign_parms
the code that handles where each arguments is stored is the following:
3207 if (assign_parm_setup_block_p (&data))
3208 assign_parm_setup_block (&all, parm, &data);
3209 else if (data.passed_pointer || use_register_for_decl (parm))
3210 assign_parm_setup_reg (&all, parm, &data);
3211 else
3212 assign_parm_setup_stack (&all, parm, &data);
assign_parm_setup_block_p
handles aggregate data types and is not applicable in this case and since the data is not passed as a pointer GCC checks use_register_for_decl
.
Here the relevant part is:
1972 if (optimize)
1973 return true;
1974
1975 if (!DECL_REGISTER (decl))
1976 return false;
DECL_REGISTER
tests whether the variable was declared with the register
keyword. And now we have our answer: Most parameters live on the stack when optimizations are not enabled, and are then handled by assign_parm_setup_stack
. The route taken through the source code before it ends up spilling the value is slightly more complicated for pointer arguments, but can be traced in the same file if you're curious.
Why does GCC spill all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:
1 extern int bar(int);
2 int foo(int a) {
3 int b = bar(a | 1);
4 b += 42;
5 return b;
6 }
Compiled with gcc -O1 -c
this generates the following on my machine:
0: 48 83 ec 08 sub $0x8,%rsp
4: 83 cf 01 or $0x1,%edi
7: e8 00 00 00 00 callq c <foo+0xc>
c: 83 c0 2a add $0x2a,%eax
f: 48 83 c4 08 add $0x8,%rsp
13: c3 retq
Which is fine except if you break on line 5 and try to print the value of a, you get
(gdb) print a
$1 = <value optimized out>
As the argument gets overwritten since it's not used after the call to bar
.
A couple of reasons:
- In the general case, an argument to a function has to be treated like a local variable because it could be stored to or have its address taken within the function. Therefore, it is simplest to just allocate a stack slot for every arguments.
- Debug information becomes much simpler to emit with stack locations: the argument's value is always at some specific location, instead of moving around between registers and memory.
When you're looking at -O0 code in general, consider that the compiler's top priorities are reducing compile-time as much as possible and generating high-quality debugging information.
精彩评论