开发者

GCC inline asm NOP loop not being unrolled at compile time

开发者 https://www.devze.com 2023-02-02 09:17 出处:网络
Venturing out of my usual VC++ realm into the world of GCC (via MINGW32). Trying to create a Windows PE that consists largely of NOPs, ala:

Venturing out of my usual VC++ realm into the world of GCC (via MINGW32). Trying to create a Windows PE that consists largely of NOPs, ala:

for(i = 0; i < 1000; i++)
{
    asm("nop");
}

But either I'm using the wrong syntax or the compiler is optimising through them because those NOPs don't survive the compilation process.

I'm using the -O0 flag, otherwise defaults. Any ideas on ho开发者_StackOverflow社区w I can coax the compiler into leaving the NOPs intact?


A convenient way to get 1000 inline nops is to use the .rept directive of the GNU assembler:

void thousand_nops(void) {
    asm(".rept 1000 ; nop ; .endr");
}

Try on godbolt.


Are you expecting it to unroll the loop in to 1000 nops? I did a quick test with gcc and I don't see the (one) nop disappear:

        xorl    %eax, %eax
        .p2align 4,,7
.L2:
#APP
        nop
#NO_APP
        addl    $1, %eax
        cmpl    $1000, %eax
        jne     .L2

With gcc -S -O3 -funroll-all-loops I see it unroll the loop 8 times (thus 8 nop) but I think if you want 1000 it's going to be easiest to do:

#define NOP10() asm("nop;nop;nop;nop;nop;nop;nop;nop;nop;nop")

And then use NOP10(); ...


This recent question about looping to 1000 without conditionals resulted in a clever answer using template recursion which can actually be used to produce your 1000 nop function without repeating asm("nop") at all. There are some caveats: If you don't get the compiler to inline the function you will end up with a 1000-deep recursive stack of individual nop functions. Also, gcc's default template depth limit is 500 so you must specify a higher limit explicitly (see below, though you could simply avoid exceeding nop<500>()).

// compile time recursion
template<int N> inline void nop()
{
    nop<N-1>();
    asm("nop");
}

template<> inline void nop<0>() { }

void nops()
{
    nop<1000>();
}

Compiled with:

 g++ -O2 -ftemplate-depth=1000 ctr.c


in addition to the answer by @BenJackson, it can recurse with way less depth by (binary) division.

template<unsigned int N> inline void nop()
{
    nop<N/2>();
    nop<N/2>();
    nop<N-2*(N/2)>();
}

template<> inline void nop<0>() { }
template<> inline void nop<1>() { asm("nop"); }
0

精彩评论

暂无评论...
验证码 换一张
取 消