A program has three sections: text, data and stack. The function body lives in the text section. Can we let a function body live on heap? Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.
In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.
Could you tell me why? If my program can not be repaired, could you provide a way to let a function live on heap? Thanks!
Turing开发者_StackOverflow中文版.robot
#include "stdio.h"
#include "stdlib.h"
#include "string.h"
void
hello()
{
printf( "Hello World!\n");
}
int main(void)
{
void (*fp)();
int size = 10000; // large enough to contain hello()
char* buffer;
buffer = (char*) malloc ( size );
memcpy( buffer,(char*)hello,size );
fp = buffer;
fp();
free (buffer);
return 0;
}
My examples below are for Linux x86_64
with gcc
, but similar considerations should apply on other systems.
Can we let a function body live on heap?
Yes, absolutely we can. But usually that is called JIT (Just-in-time) compilation. See this for basic idea.
Because we can manipulate memory on heap more freely, we may gain more freedom to manipulate functions.
Exactly, that's why higher level languages like JavaScript have JIT compilers.
In the following C code, I copy the text of hello function onto heap and then point a function pointer to it. The program compiles fine by gcc but gives "Segmentation fault" when running.
Actually you have multiple "Segmentation fault"
s in that code.
The first one comes from this line:
int size = 10000; // large enough to contain hello()
If you see x86_64
machine code generated by gcc
of your
hello
function, it compiles down to mere 17 bytes:
0000000000400626 <hello>:
400626: 55 push %rbp
400627: 48 89 e5 mov %rsp,%rbp
40062a: bf 98 07 40 00 mov $0x400798,%edi
40062f: e8 9c fe ff ff call 4004d0 <puts@plt>
400634: 90 nop
400635: 5d pop %rbp
400636: c3 retq
So, when you are trying to copy 10,000 bytes, you run into a memory
that does not exist and get "Segmentation fault"
.
Secondly, you allocate memory with malloc
, which gives you a slice of
memory that is protected by CPU against execution on Linux x86_64
, so
this would give you another "Segmentation fault"
.
Under the hood malloc
uses system calls like brk
, sbrk
, and mmap
to allocate memory. What you need to do is allocate executable memory using mmap
system call with PROT_EXEC
protection.
Thirdly, when gcc
compiles your hello
function, you don't really know what optimisations it will use and what the resulting machine code looks like.
For example, if you see line 4 of the compiled hello
function
40062f: e8 9c fe ff ff call 4004d0 <puts@plt>
gcc
optimised it to use puts
function instead of printf
, but that is
not even the main problem.
On x86
architectures you normally call functions using call
assembly
mnemonic, however, it is not a single instruction, there are actually many different machine instructions that call
can compile to, see Intel manual page Vol. 2A 3-123, for reference.
In you case the compiler has chosen to use relative addressing for the call
assembly instruction.
You can see that, because your call
instruction has e8
opcode:
E8 - Call near, relative, displacement relative to next instruction. 32-bit displacement sign extended to 64-bits in 64-bit mode.
Which basically means that instruction pointer will jump the relative amount of bytes from the current instruction pointer.
Now, when you relocate your code with memcpy
to the heap, you simply copy that relative call
which will now jump the instruction pointer relative from where you copied your code to into the heap, and that memory will most likely not exist and you will get another "Segmentation fault"
.
If my program can not be repaired, could you provide a way to let a function live on heap? Thanks!
Below is a working code, here is what I do:
- Execute,
printf
once to make suregcc
includes it in our binary. - Copy the correct size of bytes to heap, in order to not access memory that does not exist.
- Allocate executable memory with
mmap
andPROT_EXEC
option. - Pass
printf
function as argument to ourheap_function
to make sure thatgcc
uses absolute jumps forcall
instruction.
Here is a working code:
#include "stdio.h"
#include "string.h"
#include <stdint.h>
#include <sys/mman.h>
typedef int (*printf_t)(char* format, char* string);
typedef int (*heap_function_t)(printf_t myprintf, char* str, int a, int b);
int heap_function(printf_t myprintf, char* str, int a, int b) {
myprintf("%s", str);
return a + b;
}
int heap_function_end() {
return 0;
}
int main(void) {
// By printing something here, `gcc` will include `printf`
// function at some address (`0x4004d0` in my case) in our binary,
// with `printf_t` two argument signature.
printf("%s", "Just including printf in binary\n");
// Allocate the correct size of
// executable `PROT_EXEC` memory.
size_t size = (size_t) ((intptr_t) heap_function_end - (intptr_t) heap_function);
char* buffer = (char*) mmap(0, (size_t) size,
PROT_EXEC | PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
memcpy(buffer, (char*)heap_function, size);
// Call our function
heap_function_t fp = (heap_function_t) buffer;
int res = fp((void*) printf, "Hello world, from heap!\n", 1, 2);
printf("a + b = %i\n", res);
}
Save in main.c
and run with:
gcc -o main main.c && ./main
In principle in concept it is doable. However... You are copying from "hello" which basically contains assembly instructions that possibly call or reference or jump to other addresses. Some of these addresses get fixed up when the application loads. Just copying that and calling into it would then crash. Also some systems like windows have data execution protection that would prevent code in data form being executed, as a security measure. Also, how large is "hello"? Trying to copy past the end of it would likely also crash. And you are also dependent on how the compiler implements "hallo". Needless to say, this would be very compiler and platform dependent, if it worked.
I can imagine that this might work on a very simple architecture or with a compiler designed to make it easy.
A few of the many requirements for this work:
- All memory references would need to be absolute ... no pc-relative addresses, except . . .
- Certain control transfers would need to be pc-relative (so your copied function's local branches work) but it would be nice if other ones would just happen to be absolute, so your module's external control transfers, like
printf()
, would work.
There are more requirements. Add to this the wierdness of doing this in what is likely to already be a highly complex dynamically linked environment (did you static link it?) and you simply are not ever going to get this to work.
And as Adam points out, there are security mechanisms in place, at least for the stack, to prevent dynamically constructed code from executing at all. You may need to figure out how to turn these off.
You might also be getting clobbered with the memcpy()
.
You might learn something by tracing this through step-by-step and watching it shoot itself in the head. If the memcpy hack is the problem, perhaps try something like:
f() {
...
}
g() {
...
}
memcpy(dst, f, (intptr_t)g - (intptr_t)f)
You program is segfaulting because you're memcpy'ing more than just "hello"; that function is not 10000 bytes long, so as soon as you get past hello itself, you segfault because you're accessing memory that doesn't belong to you.
You probably also need to use mmap() at some point to make sure the memory location you're trying to call is actually executable.
There are many systems that do what you seem to want (e.g., Java's JIT compiler creates native code in the heap and executes it), but your example will be way more complicated than that because there's no easy way to know the size of your function at runtime (and it's even harder at compile time, when the compiler hasn't yet decide what optimizations to apply). You can probably do what objdump does and read the executable at runtime to find the right "size", but I don't think that's what you're actually trying to achieve here.
After malloc you should check that the pointer is not null buffer = (char*) malloc ( size );
memcpy( buffer,(char*)hello,size );
and it might be your problem since you try to allocate a big area in memory. can you check that?
memcpy( buffer,(char*)hello,size );
hello
is not a source get copied to buffer. You are cheating the compiler and it is taking it's revenge at run-time. By typecasting hello
to char*
, the program is making the compiler to believe it so, which is not the case actually. Never out-smart the compiler.
精彩评论