Pad instruction so end is aligned_问答_开发者_运维开发者技术经验分享

I'm working with GNU assembler on i386, generally under 32-bit Linux (I'm also aiming for a solution under Cygwin).

I have a "stub" function:

    .align 4
stub:
    call *trampoline
    .align 4
stub2:

trampoline:
    ...

The idea is that the data between stub and stub2 will be copied into allocated memory, along with a function pointer and some context data. When the memory is called, the first instruction in it will push the address of the next instruction and go to trampoline which will read the address off the stack and figure out the location of the accompanying data.

Now, stub gets compiled to:

ff 15 44 00 00 00      call *0x44
66 90                  xchg %ax,%ax

This is a call to an absolute address, which is good because the address of the call is unknown. The padding has been turned into what I guess is a do-nothing operation, which is fine and anyway it will never be executed, because trampoline will rewrite the stack before jumping to the function pointer.

The problem is that the return address pushed by this call will point to the non-aligned xchg instruction, rather than the aligned data just past it. This means trampoline needs to correct the alignment to find the data. This isn't a serious problem but it would be slightly preferable to generate something like:

66 90                  xchg %ax,%ax
ff 15 44 00 00 00      call *0x44
# Data will be placed starting here

So that the return address points directly at the data. The question is, then: how can I pad the instruction so that the end of it is aligned?

Edit A little background (for those who haven't already guessed). I'm trying to implement closures. In the language,

(int -> int) make_curried_adder(int x)
{
    return int lambda (int y) { return x + y; };
}

(int -> int) plus7;
plus7 = make_curried_adder(7);
print("7 + 5 = ", plus7(5));

The { return x + y } is translated into a normal but anonymous function of two parameters. A block of memory is allocated and populated with the stub instructions, the address of the function, and the value 7. This is returned by make_curried_adder and when called will push the additional argument 7 on the stack then jump to the anonymous function.

Update

I've accepted Pascal's answer, which is that assemblers tend to be written to run in a single pass. I think some asse开发者_C百科mblers do have more than one pass to deal with code like "call x; ... ; x: ...", which has a forward reference. (In fact I wrote one a long time ago -- it would go back and fill in the correct address once it had reached x.) Or perhaps all such holes are left for the linker to close. Another problem with end-padding is that you need syntax to say "insert padding here so that there is aligned". I can think of an algorithm that would work for simple cases like that, but it may be such an obscure feature as to not be worth implementing. More complicated cases with nested padding might have contradictory results...

Unfortunately, most assemblers are one-pass simple translators, which limit the flexibility of alignment directives they can offer. Even among all the alignment options that assemblers working in several passes could offer, many are neglected because there are too specific. Yours is one of those, I am afraid. It could work in a one-pass assembler as long as it's only one instruction you intend to move, but it's very specific.

I have seen the manual of a sophisticated multi-pass assembler that let you substract the addresses of two labels to get the length of a sequence of instruction, and would let you insert a directive to insert a sequence of NOPs, say, (4 - this length modulo 4) in the place of your choice (as long as it remained possible to converge on a definite position for each instruction). I can't remember what assembler it was. Definitely not gas, which is one-pass as far as I know. It may have been the venerable A386.

Is there a problem with adding your own xchg instruction prior to the call? Since you have an align just prior to stub, the alignment should be consistent.

Have you considered putting the data before the code?

This way it is only a subtraction (of the length of the stub code plus some constant offset) to get to the address of the data, so it's one instruction instead of two as your were ready to accept. And I believe that gas will give you the length of the stub code (as the difference of two labels) without problem since the labels are used after having been defined in this case.

Assuming the data is made of 32-bit words, there is also less padding involved compared to your initial solution (although I am not sure why there so many .align directives in your initial solution, probably some orthogonal constraint that you didn't get into).