The following 32bit x86 Linux program prints a string of arbitrary length (as long as a program can be, anyway) and does exit(0)
afterwards:
.global _start ; notice on entry here, all regs but %esp are zero
_start:
call .L0 ; offset == strlen, provided by your assembler
.byte 'H','e','l','l','o',',',' ','W','o','r','l','d'
.L0:
pop %ecx ; ret addr is starting addr of string
mov -4(%ecx),%edx ; argument to `call`, 4 bytes: strlen
inc %ebx ; stdout == 1
movb $4, %al ; SYS_write == 4
int $0x80
xchg %eax,%ebp ; %ebp is still zero
xchg %eax,%ebx ; SYS_exit == 1, return value == 0
int $0x80
If one's willing to sacrifice the position-independence (instead, force the linker to insert the string address), and not care about the program returning zero, one can get it down to:
.global _start
_start:
movb $4, %al
inc %ebx
mov $.L0, %ecx ; this address is calculated when linking
movb $.Lend-.L0, %dl ; strlen, calculated by assembler
int $0x80
xchg %eax,%ebx
int %0x80
.L0:
.byte 'H','e','l','l','o',',',' ','W','o','r','l','d'
.Lend:
Both of these can be assembled/linked via as --32 -o x.o x.S; ld -s -m elf_i386 x.o
, and run just fine. The second one is 26 Bytes of code. If you perm开发者_如何学Cit a crash after printing Hello, World
then leave the last two instructions out, 23 Bytes. That's as low as I could go.
Question that's always bugged me, is it possible to squeeze a few more bytes off this ? Pure speculation of mine gives these possible leads:
- Somehow use parts of the 'Hello, World' itself as code ?
- Anyone knows a usable syscall easter egg ?
- trick the linker into making the entrypoint a 16-bit address so that
movw $.L0, %cx
could be used (saves one byte) ? - Do an 8-bit offset
jmp
to a place that's known (or created via assembler / linker invocation magic) to contain the necessary instructions for theexit(...)
syscall, saving one byte over thexchg; int
sequence ?
Or else, can it be proven that this actually is the smallest well-behaved (no crash / return code zero) Linux/x86 "Hello, World" ?
Edit
To clarify, the question is not about minimizing the size of the ELF executable; techniques for that are long-known. I'm explicitly inquiring about the size of a Linux 32bit x86 assembly program that performs the equivalent of what the compiled code for:
int main(int argc, char **argv)
{
puts("Hello, World");
exit(0); /* or whatever code */
}
would do.
In fact, I'll be happy about anything that does not require manual editing of the ELF headers. If you find a way to e.g. stuff the"Hello, World"
into some ELF object and referencing that from the assembly source, using only the assembler / linker command line and/or a mapfile input, I'd consider it valid enough, even though that increases the size of the ELF executable. I just want to know if the instruction sequence to print "Hello, World" and exit()
afterwards can be shrunk still.
The question is about code size, not executable size.It's been done back in 1999. Have a look at this page (spoiler: the end result is a 45-byte ELF file). Make sure to read the postscript too.
A straightforward translation of the C code, using libc, results in 16 bytes of instructions:
.S:
.asciz "Hello, World"
.globl main
main:
push $.S
call puts
add $4, %esp
xor %eax, %eax
ret
If you use x86-64 instead if x86-32, the calling convention passes arguments in registers so we can skip the stack manipulation, and
main:
mov $.S, %rdi
call puts
xor %eax, %eax
ret
is only 15 bytes of code.
精彩评论