Admittedly, I have a bit silly question. Basically, I am wondering if there are some special mechanisms provided by Intel processors to efficiently execute a series of dummy, i.e., NOP instructions? For instance,I could imagine there could be some kind of pre-fetch mechanism that identifies NOPS, discards them and tries to fetch some useful instructions instead. Or are these NOPS dispatche开发者_如何学编程d to the execution unit as normal instructions, meaning that i can roughly process 5 nops each cycle (under the assumption that there are 5 execution units)
Thanks, Reinhard
Discarding them would be pretty bad idea: they are often used for busy-waiting. If you discard NOP
s, you make your wait-loop much tighter than it should be and potentially introduce considerable communications overhead.
If you feel that NOP
s are inefficient, you could try HLT
which saves some energy. Or you could even send the CPU into a sleep state. However, these only make sense if you want to "do nothing" for a considerable amount of time and they usually require suvervisor privileges.
No. They are decoded and executed as normal instructions; there is hardware support to remove the false dependency that would otherwise be introduced on the EAX register for the single byte NOP, 0x90 (which is really xchg eax, eax
), but that's all.
Reference: Intel(R) 64 and IA-32 Architectures Optimization Reference Manual - section 3.5.1.8, "Using NOPs".
There's very little need for optimizing sequences of no-ops on the x86 architecture because it has no-op encodings of varying lengths. Instead of many one-byte no-ops, one can just use a single multi-byte no-op. Somewhat more work for the decoder, but the actual execution units only see a single instruction to execute.
精彩评论