开发者

Why is a 16-bit register used with BSR instruction in this code snippet?

开发者 https://www.devze.com 2022-12-12 16:50 出处:网络
In this hardcore article there\'s a function find_maskwidth() that basically detects the number of bits required to represent itemCount dictinct values:

In this hardcore article there's a function find_maskwidth() that basically detects the number of bits required to represent itemCount dictinct values:

开发者_如何学JAVAunsigned int find_maskwidth( unsigned int itemCount )
{
    unsigned int maskWidth, count = itemCount;
    __asm {
        mov eax, count
        mov ecx, 0
        mov maskWidth, ecx
        dec eax
        bsr cx, ax
        jz next
        inc cx
        mov maskWidth, ecx
    next:
    }
    return maskWidth; 
} 

the question is why do they use ax and cx registers instead of eax and ecx?


I can only guess that they expect to only ever deal with fields that have no more than 16 bits. Since it's being used to determine the number of bits that can be used to report things like the number of cores or logical processors in a package it'll probably be a while before it overflows above 65535.

I just hope someone doesn't decide to use the routine for a more general purpose.

And just an FYI - if you want to do something like this without dropping to x86 assembly (though I guess for the article's purpose, being non-portable is pretty much a given), the Bit Twiddling Hacks page has you covered.


I guess the original data is only 16-bits wide. Since cx is getting a bit number, there is no chance that it could be greater than a very small number anyway.

It's interesting to note that at one level the opcodes for 16-bit and 32-bit ia32 instructions are the same except for the prefix byte, so it is more efficient to issue all-32 or all-16 bit instructions, depending on which mode you are in. I suppose that's why you asked the question...


Given the fact that this code is really really ill-written (there is absolutely no need for the maskWidth and count variables, for example - apart from making the code confusing) I guess you can rest assured that it is just another "bad thing" in this code.


The routine bascially determines the binary (base two) logarithm of itemCount.

It will deliver completly wrong values, if itemCount > 2^16. It does not saturate or something, it is just plain wrong. The input parameter is "unsigned int", which makes it even more wrong. So it will stop working at more 65536 cores.

My guess is, someone at Intel digged up some really ancient piece of code, dating back to 16 bit times, without really understanding it, and used it, because 65536 will be enough forever, exactly as 640k memory will be sufficient forever or as two-digit year numbers will suffice until the end of time.


I'd say it's because the author of this code probably didn't really know what he was doing :-). The 16-bit versions of these instructions are longer, and not any faster. In fact, they will probably cause a partial register stall on the next instruction that uses ECX (i.e. the MOV).

Also note that the jump can be safely moved one instruction earlier (after the DEC), as the DEC already sets ZF when its output is zero. This can simplify the code a bit.

So this is how I would write this code snippet:

  mov eax, [count]
  xor ecx, ecx
  dec eax
  jz next
  bsr ecx, eax
  inc ecx
next:
  mov [maskWidth], ecx

Also, the motivation for dropping to assembly here seems to be using the BSR instruction, which does not have any equivalent in the C language or library. You can avoid using assembly by using a compiler-specific intrinsic function for this purpose. While these are inherently nonportable, neither is inline assembly.

In GCC the equivalent function would look like this:

unsigned int find_maskwidth(unsigned int itemCount)
{
   if(itemCount <= 1)
      return 0;
   else
      return 32 - __builtin_clz(itemCount - 1);
}

Much more readable, isn't it?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号