I decided it would be fun to learn x86 assembly during the summer break. So I started with a very simple hello world program, borrowing on free examples gcc -S
could give me. I ended up with this:
HELLO:
.ascii "Hello, world!\12\0"
.text
.globl _main
_main:
pushl %ebp # 1. puts the base stack address on the stack
movl %esp, %ebp # 2. puts the base stack address in the stack address register
subl $20, %esp # 3. ???
pushl $HELLO # 4. push HELLO's address on the stack
call _puts # 5. call puts
xorl %eax, %eax # 6. zero %eax, probably not necessary since we didn't do anything with it
leave # 7. clean up
ret # 8. return
# PROFIT!
It compiles and even works! And I think I understand most of it.
Though, magic happens at step 3. Would I remove this line, my program would die between the call to puts
and the xor
from a misaligned stack error. And would I change $20
to another value, it'd crash too. So I came to the conclusion that this value is very
important.
Problem is, I don't know what it does and why it's n开发者_Go百科eeded.
Can anyone explain me? (I'm on Mac OS, would it ever matter.)
On x86 OSX, the stack needs to be 16 byte aligned for function calls, see ABI doc here. So, the explanation is
push stack pointer (#1) -4 strange increment (#3) -20 push argument (#4) -4 call pushes return address (#5) -4 total -32
To check, change line #3 from $20 to $4, which also works.
Also, Ignacio Vazquez-Abrams points out, #6 is not optional. Registers contain remnants of previous calculations so it has to explicitly be zeroed.
I recently learned (still learning) assembly, too. To save you the shock, 64bit calling conventions are MUCH different (parameters passed on the register). Found this very helpful for 64bit assembly.
The general form of the comment should be "Allocates space for local variables". Why changing it arbitrarily would crash it I'm not sure. I can only see it crashing if you reduce it. And the proper comment for 6 is "Prepare to return a 0 from this function".
Note that if you compile with -fomit-frame-pointer some of that %ebp
pointer boilerplate will disappear. The base pointer is helpful for debugging but isn't actually necessary on x86.
Also I highly recommend using Intel syntax, which is supported by all the GCC/binutils stuff. I used to think that the difference between AT&T and Intel syntax was just a matter of taste, but then one day I came across this example where the AT&T mnemonic is just totally different from the Intel one. And since all the official x86 documentation uses Intel syntax, it seems like a better way to go.
Have fun!
精彩评论