开发者

Is it possible to execute code from the stack in standard C?

开发者 https://www.devze.com 2023-01-16 18:25 出处:网络
The following code doesn\'t work as intended but hopefully illustrates my attempt: long foo (int a, int b) {

The following code doesn't work as intended but hopefully illustrates my attempt:

long foo (int a, int b) {
  return a + b;
}

void call_foo_from_stack (void) {
  /* reserve space on the stack to store foo's code */
  char code[sizeof(*foo)];

  /* have a pointer to the beginning of the code */
  long (*fooptr)(int, int) = (long (*)(int, int)) code;

  /* copy foo's code to the stack */
  memcpy(code, foo, sizeof(*foo));

  /* execute foo from the stack */
  fooptr(3, 5开发者_如何学JAVA);
}

Obviously, sizeof(*foo) doesn't return the size of the code of the foo() function.

I am aware that executing the stack is restricted on some CPUs (or at least if a restriction flag is set). Apart from GCC's nested functions that can eventually be stored on the stack, is there a way to do that in standard C?


A valid use case for this kind of thing is an embedded system that is generally running out of FLASH memory, but is required to be able to reprogram itself in the field. To do this, a portion of the code must run from some other memory device (in my case the FLASH device itself could not erase and program one page while allowing reads from any other page, but there are devices that can do that), and there was enough RAM in the system to hold both the flash writer and the new application image to be written.

We wrote the necessary FLASH programming function in C, but used #pragma directives to have it placed in a distinct .text segment from the rest of the code. In linker control file, we had the linker define global symbols for the start and end of that segment, and had it located at a base address in the RAM, while placing the generated code in a load region that was located in the FLASH along with the initialization data for the .data segment and the pure read-only .rodata segment; the base address in the FLASH was computed and defined as a global symbol as well.

At run time, when the application update feature was exercised, we read the new application image into its buffer (and did all the sanity checks that should be done to make sure it actually was an application image for this device). We then copied the update kernel out of its dormant location in FLASH to its linked location in RAM (using the global symbols defined by the linker), then called it just like any other function. We didn't have to do anything special at the call site (not even a function pointer) because as far as the linker was concerned it was located in RAM the whole time. The fact that during normal operation that particular piece of RAM had a very different purpose was not important to the linker.

That said, all of the machinery that made this possible is either outside the scope of the standard, or solidly implementation defined behavior. The standard doesn't care how code gets loaded into memory before it is executed. It just says that the system can execute code.


sizeof(*foo) isn’t the size of the function foo, it’s the size of a pointer to foo (which will usually be the same size as every other pointer on your platform).

sizeof can’t measure the size of a function. The reason is that sizeof is a static operator, and the size of a function is not known at compile time.

Since the size of a function is not known at compile time, that also means that you can’t define a statically-size array that is large enough to contain a function.

You might be able to do something horrible using alloca and some nasty hacks, but the short answer is no, I don’t think you can do this with standard C.

It should also be noted that the stack is not executable on modern, secure operating systems. In some cases you might be able to make it executable, but that is a very bad idea that will leave your program wide open to stack smashing attacks and horrible bugs.


Aside from all the other problems, I don't think anyone has yet mentioned that code in its final form in memory cannot in general be relocated. Your example foo function, maybe, but consider:

int main(int argc, char **argv) {
    if (argc == 3) {
        return 1;
    } else {
        return 0;
    }
}

Part of the result:

    if (argc == 3) {
  401149:       83 3b 03                cmpl   $0x3,(%ebx)
  40114c:       75 09                   jne    401157 <_main+0x27>
        return 1;
  40114e:       c7 45 f4 01 00 00 00    movl   $0x1,-0xc(%ebp)
  401155:       eb 07                   jmp    40115e <_main+0x2e>
    } else {
        return 0;
  401157:       c7 45 f4 00 00 00 00    movl   $0x0,-0xc(%ebp)
  40115e:       8b 45 f4                mov    -0xc(%ebp),%eax
    }

Note the jne 401157 <_main+0x27>. In this case, we have an x86 conditional near jump instruction 0x75 0x09, which goes 9 bytes forward. So that's relocatable: if we copy the code elsewhere then we still want to go 9 bytes forward. But what if it was a relative jump or call, to code which isn't part of the function that you copied? You'd jump to some arbitrary location on or near your stack.

Not all jump and call instructions are like this (not on all architectures, and not even all on x86). Some refer to absolute addresses, by loading the address into a register and then doing a far jump/call. When the code is prepared for execution, the so-called "loader" will "fix up" the code by filling in whatever address the target ends up actually having in memory. Copying such code will (at best) result in code that jumps to or calls the same address as the original. If the target isn't in the code you're copying that's probably what you want. If the target is in the code you're copying then you're jumping to the original instead of to the copy.

The same issues of relative vs. absolute addresses apply to things other than code. For example, references to data sections (containing string literals, global variables, etc) will go wrong if they're addressed relatively and aren't part of the copied code.

Also, a function pointer doesn't necessarily contain the address of the first instruction in the function. For example, on an ARM processor in ARM/thumb interworking mode, the address of a thumb function is 1 greater than the address of its first instruction. In effect, the least significant bit of the value isn't part of the address, it's a flag to tell the CPU to switch to thumb mode as part of the jump.


If you need to measure the size of a function, have the compiler/linker output a map file and you can calculate function size based off of that information.


Your OS shouldn't let you do that easily. There shouldn't be any memory with both write and execute permissions, and specially the stack has many different protections (see ExecShield, OpenWall patches, ...). IIRC, Selinux also includes stack execution restrictions. You'll have to find a way to do one or more of:

  • Disable stack protection at the OS level.
  • Allow execution from the stack on a particular executable file.
  • mprotect() the stack.
  • Maybe some other things...


The reserve and copy parts of your idea are fine. Getting a code pointer to your awesome stack code/data, that's harder. A typecast of the address of your stack to a code pointer should do the trick.


{
   u8 code[256];

   int (*pt2Function)() = (int (*)())&code;

   code();
}

On a managed system, this code should never be allowed to execute. On an embedded system that shares code and data memory, it should work just fine. There are of course caching issues, security issues, job security issues when your peers read the code, etc. with this though...


There are lots of ways that trying to do this can go wrong, but it can and has been done. This is one of the ways that buffer overflow attacks have worked -- write in a small malicious program for what is likely the architecture of the target computer along with code and/or data that is likely to get the processor to end up executing the malicious code and hope for the worst.

There have also been less evil uses of this, but it generally is restricted by the OS and/or CPU. Some CPUs can't allow this at all since the code and stack memory are in different address spaces.

One thing that you will need to account for if you do want to do this is that the code that you write into the stack space will need to be compiled (or if written as assembly or machine code, written as) position independent code or you will have to make sure that it ends up at a certain address (and that it was written/compiled to expect this).

I don't think that the C standard says anything about this.


Your problem is roughly similar to dynamically generated code, except that you want to execute from stack instead of a generic memory region.

You'll need to grab enough stack to fit the copy of your function. You can find out how large the foo() function is by compiling it and looking at the resulting assembly. Then hard-code the size of your code[] array to fit at least that much. Also make sure code[], or the way you copy foo() into code[], gives the copied function the correct instruction alignment for your processor architecture.

If your processor has an instruction prefetch buffer then you will need to flush it after the copy and prior to executing the function from stack, or it will almost certainly have prefetched the wrong data and you'll end up executing garbage. Managing the prefetch buffer and associated caches is the biggest stumbling block I've encountered in experimenting with dynamically generated code.

As others have mentioned, if your stack isn't executable then this is a non-starter.


As others have said, it's not possible to do this in a standard way - what you end up with will be platform-specific: CPU because of the way opcodes are structured (relative vs. absolute references), OS because you'll likely need to set page protection to be allowed to execute from stack. Furthermore, it's compiler-dependent: there's no standard-and-guaranteed way to get the size of a function.

If you really do have a good use-case, like the flash reprogramming RBerteig mentions, be prepared to mess with linker scripts, verify disassembly, and know you're writing very non-standard and unportable code :)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号