The problem is: given an integer val1
find the position of the highest bit set (Most Significant Bit) then, given a second integer val2
find a contiguous region of unset bits to the left of the position yielded from the first integer. width
specifies the minimum number of unset bits that must be found in contiguity (ie width
zeros without ones within them).
Here is the C code for my solution:
#include <limits.h> /* for CHAR_BIT - number of bits in a char */
typedef unsigned int t;
unsigned const t_bits = sizeof(t) * CHAR_BIT;
_Bool test_fit_within_left_of_msb( unsigned width,
t val1, /* integer to find MSB of */
t val2, /* integer to find width zero bits in */
unsigned* offset_result)
{
unsigned offbit = 0; /* 0 starts at high bit */
unsigned msb = 0;
t mask;
t b;
while(val1 >>= 1) /* find MSB! */
++msb;
while(offbit + width < t_bits - msb)
{
/* mask width bits starting at offbit */
mask = (((t)1 << width) - 1) << (t_bits - width - offbit);
b = val2 & mask;
if (!b) /* result! no bits set, we can use this */
{
*offset_result = offbit;
return true;
}
if (offbit++) /* this conditional bothers me! */
b <<= offbit - 1;
while(b <<= 1)
offbit++; /* increment offbit past all bits set */
}
return false; /* no region of width zero bits found, bummer. */
}
Aside from faster ways of finding the MSB of the first integer, the commented test for a zero offbit
seems a bit extraneous, but necessary to skip the highest bit of type t
if it is set. Unconditionally left shifting b
by offbit - 1
bits will result in an infinite loop and the mask never gets past the 1 in the high bit of val2 (otherwise, if the high bit is zero no problem).
I have also implemented similar algorithms but working to the right of the MSB of the first number, so they don't require this seemingly extra condition.
How can I get rid of this extra condition, or even, are there far more optimal solutions?
Edit: Some background not strictly required. The offset result is a count of bits from the high bit, not from the low bit as maybe expected. This will be part of a wider algorithm which scans a 2D array for a 2D area of zero bits.
Here, for testing, the algorithm has been simplified. val1
represents the first integer which does not have all bits set found in a row of the 2D array. From this the 2D version would scan down which is what val2
represents.
Here's some output showing success and failure:
t_bits:32
t_high: 10000000000000000000000000000000 ( 2147483648 开发者_Python百科)
---------
-----------------------------------
*** fit within left of msb test ***
-----------------------------------
val1: 00000000000000000000000010000000 ( 128 )
val2: 01000001000100000000100100001001 ( 1091569929 )
msb: 7
offbit:0 + width: 8 = 8
mask: 11111111000000000000000000000000 ( 4278190080 )
b: 01000001000000000000000000000000 ( 1090519040 )
offbit:8 + width: 8 = 16
mask: 00000000111111110000000000000000 ( 16711680 )
b: 00000000000100000000000000000000 ( 1048576 )
offbit:12 + width: 8 = 20
mask: 00000000000011111111000000000000 ( 1044480 )
b: 00000000000000000000000000000000 ( 0 )
offbit:12
iters:10
***** found room for width:8 at offset: 12 *****
-----------------------------------
*** fit within left of msb test ***
-----------------------------------
val1: 00000000000000000000000001000000 ( 64 )
val2: 00010000000000001000010001000001 ( 268469313 )
msb: 6
offbit:0 + width: 13 = 13
mask: 11111111111110000000000000000000 ( 4294443008 )
b: 00010000000000000000000000000000 ( 268435456 )
offbit:4 + width: 13 = 17
mask: 00001111111111111000000000000000 ( 268402688 )
b: 00000000000000001000000000000000 ( 32768 )
***** mask: 00001111111111111000000000000000 ( 268402688 )
offbit:17
iters:15
***** no room found for width:13 *****
(iters is the count of iterations of the inner while loop, b is result val2 & mask
)
This http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogObvious has several ways to calcuate the unsigned integer log base 2 of an unsigned integer (which is also the position of the highest bit set).
I think that this is part of what you want. I suspect that if I really knew what you want I could suggest a better way of calculating it or something that served the same purpose.
count_leading_zero_bits is often a single instruction that the compiler will provide an inline function for. otherwise put it in a loop.
count_trailing_zero_bits can use count_leading_zero_bits(x&-x) or a debruijn lookup if the former is a loop.
for simplicity I assume 32 bit values.
int offset_of_zero_bits_over_msb_of_other_value( unsigned width , unsigned value , unsigned other ) {
int count = 0;
int offset = -1;
int last = 1;
int lz = count_leading_zero_bits( other );
other |= ((1<<(32-lz2))-1); // set all bits below msb
if ( value & ~other ) {
value |= other; // set all bits below msb of other
value = ~value; // invert so zeros are ones
while ( value && count < width ) {
count += 1; // the widest run of zeros
last = value; // for counting trailing zeros
value &= value >> 1; // clear leading ones from groups
}
offset = count_trailing_zero_bits( last );
} else {
count = lz2;
offset = 32 - lz2;
}
return ( count < width ) ? -1 : offset;
}
The idea behind the code is this:
val1: 00000000000000000000000010000000 ( 128 )
val2: 01000001000100000000100100001001 ( 1091569929 )
lz1: 24
lz2: 1
val2: 01000001000100000000100011111111 // |= ((1<<(32-lz1))-1);
val2: 10111110111011111111011100000000 // = ~val2
val2: 00011110011001111111001100000000 // &= val2>>1 , count = 1
val2: 00001110001000111111000100000000 // &= val2>>1 , count = 2
val2: 00000110000000011111000000000000 // &= val2>>1 , count = 3
val2: 00000010000000001111000000000000 // &= val2>>1 , count = 4
val2: 00000000000000000111000000000000 // &= val2>>1 , count = 5
val2: 00000000000000000011000000000000 // &= val2>>1 , count = 6
val2: 00000000000000000001000000000000 // &= val2>>1 , count = 7
val2: 00000000000000000000000000000000 // &= val2>>1 , count = 8
So at each step, all ranges of zeros, now ones, are shrunken from the right. When the value is zero, the number of steps taken is the width of the widest run. At any point, counting the number of trailing zeros will give the offset to the nearest range of at least count
zeros.
If at any point count exceeds width, you can stop iterating. The maximum number of iteration is thus width, not the word size. You could make this O(log n) of width, because you can double the shift amount at each iteration as long as you do not exceed width.
Here is a DeBruijn lookup for counting trailing zero bits for 32 bit values.
static const int MultiplyDeBruijnBitPosition[32] = {
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition[((uint32_t)((v & -v) * 0x077CB531U)) >> 27];
I noticed that in both your examples, val1 had only a single bit set. If that is the case, you can use the DeBruijn trick to find the MSB.
Here is my new and improved algorithm:
int test_fit_within_left_of_msb( unsigned width,
unsigned val1,
unsigned val2 )
{
int offset = 32;
int msb = 0;
unsigned mask;
unsigned b;
msb = 32 - __builtin_clz(val1); /* GCC builtin to count Leading Zeros */
while(offset - width > msb)
{
mask = (((unsigned)1 << width) - 1) << (offset - width);
b = val2 & mask;
if (!b)
return 32 - offset;
offset = __builtin_ctz(b); /* GCC builtin to Count Trailing Zeros */
}
return -1;
}
This code has a lot of improvements over my initial implementation. Primarily, the removal of the inner while
loop by simply counting the trailing zero bits. Secondly, I have also made the algorithm work with an offset that uses the natural bit position values and thus removed some of the addition and subtraction operations my original used, until the successful return statement. You could nit pick over subtracting the offset from 32.
The important point here in the code, is the algorithm - I realize there are portability concerns and assumptions made about types and there sizes. Looking back up the page to the output where width 8 can be found at position 12 performed in 10 iterations, the new alogirthm does the same in 2 iterations of the loop.
I've used the GCC builtins for convenience here, the MultiplyDeBruijnBitPosition code that drawonward provided ( from: http://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightMultLookup ) can be used to replace __builtin_ctz, while __bultin_clz could be replaced with one of the integer log base 2 codes from the same page.
One concern here though, is the data (with sparsely set bits) I've used to test this with makes this algorithm perform better, this will maybe not be so great looking at integers with more densely set bits. (Not correct - by counting the trailing zeros it avoids this bad case).
After implementing my previous answer but to work for the right of the MSB, I saw that aside from a very minor difference, the left and right versions were exactly the same. This lead to the realization there is no real requirement for the algorithm to be working with an MSB from some prior value at all.
So although this answer does not meet the specifications of the question, it is the correct answer because the specification was incorrect.
#include<stdint.h>
/* returns bit position within a 32bit integer, where
a region of contiguous zero bits can be found whose
count is equal to or greater than width. it returns
-1 on failure.
*/
int binary_width_fit( unsigned width, uint32_t val )
{
int offset = 32;
uint32_t mask;
uint32_t b;
while(offset >= width)
{
mask = (((uint32_t)1 << width) - 1) << (offset - width);
b = val & mask;
if (!b)
return offset;
offset = __builtin_ctz(b); /* GCC builtin to Count Trailing Zeros */
}
return -1;
}
1 (fast) method is to use precalculated LOOKUP TABLES (LUTs) for each 8bit byte:
PosOfFirst1, PosOfLast1, PosOfFirst0, PosOfLast0 -- all arrays of 256 bytes
Precalculate the tables using: (soz for poor, pascalish pseudocode)
PosOfLast1:
FOR EACH ByteVal (0..255):
if byteVal>127 return 8
elseif byteVal>63 return 7
...
elseif byteVal>0 return 1
else return 0
PosOfFirst1:
c:=0;
while c<8 do
begin
bv = byteVal and 1;
if bv=1 then return c
else byteval shr 1;
inc (c);
end;
I use simple assembler procs for these algs. PosOfFirst0 and PosOfLast0 LUTs can be pre-calced using these 2 tables also - as can TRAILING & LEADING 0 (or 1) count. It's useful to pre-calc 'minus 1' versions of these tables too....
You can then use (for 8bit bytes) var InputByte: Byte; FirstBit:=PosOfFirst1[InputByte] // v.fast
For larger sizes (0, 16, 24, 32 +++++) use procs and loops that check each constituent 8bit byte. Memory access to the LUT may be needed but this method is still faster:
a) Can be used easily without needing a procedure call. b) scanning a 32 bit number takes 1 shift & comparison to 0 per byte with 1 lookup needed (if a non-zero byte is found) instead of n (0..32) shifts, ands and comparisons... c) if programmed well will halt after finding 1st/last 1
The LUT principle applies to 'population count' + other bit manip. routines...
Cheers, PrivateSi
FASTER IS BETTER?!
精彩评论