De Bruijn-like sequence for `2^n - 1`: how is it constructed?_问答_开发者

I'm looking at the entry Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup from Bit Twiddling hacks.

I can easily see how the second algorithm in that entry works

static const int MultiplyDeBruijnBitPosition2[32] = 
{
  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8, 
  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
r = MultiplyDeBruijnBitPosition2[(uint32_t)(v * 0x077CB531U) >> 27];

which calculates n = log2 v where v is known to be a power of 2. In this case 0x077CB531 is an ordinary De Bruijn sequence, and the rest is obvious.

However, the first algorithm in that entry

static const int MultiplyDeBruijnBitPosition[32] =
{
  0, 9, 1, 10, 13, 21, 2, 29, 11, 14, 16, 18, 22, 25, 3, 30,
  8, 12, 20, 28, 15, 17, 24, 7, 19, 27, 23, 6, 26, 5, 4, 31
};

v |= v >> 1;
v |= v >> 2;
v |= v >> 4;
v |= v >> 8;
v |= v >> 16;

r = MultiplyDeBruijnBitPosition[(uint32_t)(v * 0x07C4ACDDU) >> 27];

looks a bit more tricky to me. We begin by snapping v to a nearest greater 2^n - 1 value. This 2^n - 1 value is then multiplied by 0x07C4ACDD, which in this case acts the same way as the DeBruijn sequence in the previous algorithm did.

My question is: how do we construct this magic 0x07C4ACDD sequence? I.e. how do we construct a sequence that can be used to generate unique indices when multiplied by a 2^n - 1 value? For 2^n multiplier it is just an ordinary De Bruijn sequence, as we can see above, so it is clear where 0x077CB531 came from. But what about 2^n - 1 multiplier 0x07C4ACDD? I feel like I'm missing something obvious here.

P.S. To clarify my question: I'm not really looking for an algorithm to generate these sequences. I'm mor开发者_JS百科e interested in some more-or-less trivial property (if one exists) that makes 0x07C4ACDD work as we want it to work. For 0x077CB531 the property that makes it work is pretty obvious: it contains all 5-bit combinations "stored" in the sequence with 1-bit stepping (which is basically what De Bruijn sequence is).

The 0x07C4ACDD, on the other hand, is not a De Bruijn sequence by itself. So, what property were they aiming for when constructing 0x07C4ACDD (besides the non-constructive "it should make the above algorithm work")? Someone did come up with the above algorithm somehow. So they probably knew that the approach is viable, and that the appropriate sequence exists. How did they know that?

For example, if I were to construct the algorithm for an arbitrary v, I'd do

v |= v >> 1;
v |= v >> 2;
...

first. Then I'd just do ++v to turn v into a power of 2 (let's assume it doesn't overflow). Then I'd apply the first algorithm. And finally I'd do --r to obtain the final answer. However, these people managed to optimize it: they eliminated the leading ++v and the trailing --r steps simply by changing the multiplier and rearranging the table. How did they know it was possible? What is the math behind this optimization?

A De Bruijn sequence of order n over k symbols (and of k^n length) have a property that every possible n-length word appears as consecutive characters in it, some of them with cyclic wrapping. For example, in the case of k=2, n=2, the possible words are 00, 01, 10, 11, and a De Bruijn sequence is 0011. 00, 01, 11 appears in it, 10 with wrapping. This property naturally means that left shifting a De Bruijn sequence (multiplying with power of two) and taking its upper n bits results in a unique number for each power of two multiplier. Then you only need a lookup table to determine which one it is. It works on a similar principle to numbers which are one less than power of two, but the magic number in this case is not a De Bruijn sequence, but an analogue. The defining property simply changes to "every possible n-length word appears as the sum of the first m subsequences of length n, mod 2^n". This property is all that is needed for the algorithm to work. They simply used this different class of magic numbers to speed up the algorithm. I did as well.

One possible method of construction of De Bruijn numbers is the generation of a Hamiltonian path of the De Bruijn graph, Wikipedia provides an example of such a graph. In this case, the nodes are 2^5=32-bit integers, the directed edges are transitions between them, where a transition is a left shift and a binary or operation according to the label of the edge, 0 or 1. There might be a direct analogue to 2^n-1 type magic numbers, it might be worth exploring, but this isn't a way people usually construct such algorithms.

In practice you might try to construct it differently, especially if you want it to behave in a tad different manner. For example, the implementation of leading/trailing number of zeros algorithms on the bit twiddling hacks page can only return values in [0..31]. It needs additional checking for the case of 0, which has 32 zeros. This check requires a branching and can be way too slow on some CPUs.

The way I did it, I used a 64 element lookup table instead of 32, generated random magic numbers, and for each of them I built up a lookup table with power of two inputs, checked its correctness (injectivity), then verified it for all 32-bit numbers. I continued till I encountered a correct magic number. The resulting numbers do not fulfill a property of "every possible n-length word appears", since only 33 numbers appear, which are unique for all 33 possible input.

Exhaustive brute force search sounds slow, especially if good magic numbers are rare, but if we first test known power of two values as inputs, the table is filled quickly, rejection is fast, and the rejection rate is very high. We only need to clear the table after each magic number. In essence I exploited a high rejection rate algorithm to construct magic numbers.

The resulting algorithms are

int32 Integer::numberOfLeadingZeros (int32 x)
{
    static int32 v[64] = {
        32, -1, 1, 19, -1, -1, -1, 27, -1, 24, 3, -1, 29, -1, 9, -1,
        12, 7, -1, 20, -1, -1, 4, 30, 10, -1, 21, -1, 5, 31, -1, -1,
        -1, -1, 0, 18, 17, 16, -1, -1, 15, -1, -1, -1, 26, -1, 14, -1,
        23, -1, 2, -1, -1, 28, 25, -1, -1, 13, 8, -1, -1, 11, 22, 6};
    x |= x >> 1;
    x |= x >> 2;
    x |= x >> 4;
    x |= x >> 8;
    x |= x >> 16;
    x *= 0x749c0b5d;
    return v[cast<uint32>(x) >> 26];
}

int32 Integer::numberOfTrailingZeros (int32 x)
{
    static int32 v[64] = {
        32, -1, 2, -1, 3, -1, -1, -1, -1, 4, -1, 17, 13, -1, -1, 7,
        0, -1, -1, 5, -1, -1, 27, 18, 29, 14, 24, -1, -1, 20, 8, -1,
        31, 1, -1, -1, -1, 16, 12, 6, -1, -1, -1, 26, 28, 23, 19, -1,
        30, -1, 15, 11, -1, 25, 22, -1, -1, 10, -1, 21, 9, -1, -1, -1};
    x &= -x;
    x *= 0x4279976b;
    return v[cast<uint32>(x) >> 26];
}

As for your question of how did they know, they probably didn't. They experimented, tried to change things, just like me. After all, it isn't a big stretch of imagination that 2^n-1 inputs might work instead of 2^n inputs with different magic number and lookup table.

Here, I made a simplified version of my magic number generator code. It checks all possible magic numbers in 5 minutes if we check only for power of two inputs, finding 1024 magic numbers. Checking against other inputs are pointless since they are reduced to 2^n-1 form anyway. Does not construct the table, but it is trivial once you know the magic number.

#include <Frigo/all>
#include <Frigo/all.cpp>

using namespace Frigo::Lang;
using namespace std;

class MagicNumberGenerator
{

    public:

        static const int32 log2n = 5;
        static const int32 n = 1 << log2n;
        static const bool tryZero = false;

        MagicNumberGenerator () {}

        void tryAllMagic ()
        {
            for( int32 magic = 0; magic < Integer::MAX_VALUE; magic++ ){
                tryMagic(magic);
            }
            tryMagic(Integer::MAX_VALUE);
            for( int32 magic = Integer::MIN_VALUE; magic < 0; magic++ ){
                tryMagic(magic);
            }
        }

        bool tryMagic (int32 magic)
        {
            //  clear table
            for( int32 i = 0; i < n; i++ ){
                table[i] = -1;
            }
            //  try for zero
            if( tryZero and not tryInput(magic, 0) ){
                return false;
            }
            //  try for all power of two inputs, filling table quickly in the process
            for( int32 i = 0; i < 32; i++ ){
                if( not tryInput(magic, 1 << i) ){
                    return false;
                }
            }
            //  here we would test all possible 32-bit inputs except zero, but it is pointless due to the reduction to 2^n-1 form
            //  we found a magic number
            cout << "Magic number found: 0x" << Integer::toHexString(magic) << endl;
            return true;
        }

        bool tryInput (int32 magic, int32 x)
        {
            //  calculate good answer
            int32 leadingZeros = goodNumberOfLeadingZeros(x);
            //  calculate scrambled but hopefully injective answer
            x |= x >> 1;
            x |= x >> 2;
            x |= x >> 4;
            x |= x >> 8;
            x |= x >> 16;
            x *= magic;
            x = Integer::unsignedRightShift(x, 32 - log2n);
            //  reject if answer is not injective
            if( table[x] != -1 ){
                return table[x] == leadingZeros;
            }
            //  store result for further injectivity checks
            table[x] = leadingZeros;
            return true;
        }

        static int32 goodNumberOfLeadingZeros (int32 x)
        {
            int32 r = 32;
            if( cast<uint32>(x) & 0xffff0000 ){
                x >>= 16;
                r -= 16;
            }
            if( x & 0xff00 ){
                x >>= 8;
                r -= 8;
            }
            if( x & 0xf0 ){
                x >>= 4;
                r -= 4;
            }
            if( x & 0xc ){
                x >>= 2;
                r -= 2;
            }
            if( x & 0x2 ){
                x >>= 1;
                r--;
            }
            if( x & 0x1 ){
                r--;
            }
            return r;
        }

        int32 table[n];

};

int32 main (int32 argc, char* argv[])
{
    if(argc||argv){}
    measure{
        MagicNumberGenerator gen;
        gen.tryAllMagic();
    }
}

It's based on the paper Using de Bruijn Sequences to Index a 1 in a Computer Word. I'm going to guess that they did a search for a perfect hash function to map 2^n-1 to [0..31]. They describe a method for searching for counting zeroes of integers with up to two bits set which involves incrementally building up the multiplier.

From: http://www.stmintz.com/ccc/index.php?id=306404

130329821
0x07C4ACDD
00000111110001001010110011011101B

bit 31 - bit 27   00000  0
bit 30 - bit 26   00001  1
bit 29 - bit 25   00011  3
bit 28 - bit 24   00111  7
bit 27 - bit 23   01111 15
bit 26 - bit 22   11111 31
bit 25 - bit 21   11110 30
bit 24 - bit 20   11100 28
bit 23 - bit 19   11000 24
bit 22 - bit 18   10001 17
bit 21 - bit 17   00010  2
bit 20 - bit 16   00100  4
bit 19 - bit 15   01001  9
bit 18 - bit 14   10010 18
bit 17 - bit 13   00101  5
bit 16 - bit 12   01010 10
bit 15 - bit 11   10101 21
bit 14 - bit 10   01011 11
bit 13 - bit  9   10110 22
bit 12 - bit  8   01100 12
bit 11 - bit  7   11001 25
bit 10 - bit  6   10011 19
bit  9 - bit  5   00110  6
bit  8 - bit  4   01101 13
bit  7 - bit  3   11011 27
bit  6 - bit  2   10111 23
bit  5 - bit  1   01110 14
bit  4 - bit  0   11101 29
bit  3 - bit 31   11010 26 
bit  2 - bit 30   10100 20
bit  1 - bit 29   01000  8
bit  0 - bit 28   10000 16

It seems to me that 0x07C4ACDD is a 5-bit de Bruijn sequence.

The answer to your question is perhaps a bit simpler than expected. First, both 0x077CB531 and 0x07C4ACDD are de Bruijn sequences. Let's call them B and C. Now let's look at finding the top bit of n - 1 where n is a power of 2 (thus non-zero). Note that:

We calculate (n - 1) * C which is the same as n * C - C.
We only look at the top 5 bits of n * C - C.
n * C is a left-shifted version of C.
The top 5 bits of C are zero: subtracting C might decrement the top 5 bits of n * C by 1 if C is larger than the bottom 27 bits of n * C.

Here's what's special about C: it was chosen such that it is always larger than those bottom 27 bits. C was chosen such that after its top 5 bits which are zero, the next 5 bits are all 1s.

Every k-bit de Bruijn sequence has exactly one set of k consecutive 0s and one set of k consecutive 1s and none longer; however, these two subsequences are not adjacent in all de Bruijn sequences. They are adjacent in C.

Assuming n is non-zero, the rotated n * C never has 5 consecutive 1s in those next 5 bits: it always has at least a zero there, so the subtraction by C always results in decrementing the top 5 bits of n * C. In other words, using n - 1 instead of n doesn't really change anything, except rotate the lookup table by one entry.

Generating such de Bruijn sequences

One approach is to simply apply the above constraint to any method of generating de Bruijn sequences. Here's a simple example using LFSRs. While LFSRs generate de Bruijn sequences of essentially any bit length, they only find a few with the above constraint, so this is illustrative only.

Linear feedback shift registers (LFSRs) behave much like de Bruijn sequences: a maximal-period n-bit LFSR produces a periodic sequence of (2^n)-1 bits, for which the last n bits at any point cycles through all n-bit numbers (except one, typically zero). This zero is trivially added back by adding a zero bit in the LFSR output sequence at the point where it outputs n-1 consecutive zeroes (which it must do, covering all n-bit numbers save zero).

Here's example C code to find all maximal-period Galois LFSRs of a given range of bit widths, and display the corresponding de Bruijn sequence by adding the missing zero. It turns out trivial to do by starting the LFSR with the top bit set when doing right shifts (starting with the value 1 ends up with zeroes at the end instead of at the beginning). It also displays FOUND if the sequence meets the constraints stated earlier.

#include <stdio.h>

int lfsr_period(int width, int taps, int show)
{
    int max = 1 << (width - 1), n = max;
    int period = 0, lastbit = 0, adjacent = 1;
    do {
        /* Compute LFSR */
        int bit = n & 1;
        n >>= 1;
        if (bit)
            n ^= taps;

        period++;
        if (show)
            printf("%d", bit);
        if (lastbit && !bit && period < width * 2)
            adjacent = 0;
        lastbit = bit;
    } while (n != max);
    if (show && adjacent)
        printf(" FOUND");
    return period;
}

int main()
{
    for (int width = 2; width <= 12; width++) {
        printf("%d bits:\n", width);
        int max = 1 << width;
        for (int taps = max / 2; taps < max; taps++) {
            int period = lfsr_period(width, taps, 0);
            if (period == max - 1) {
                printf("0x%X: 0", taps);
                lfsr_period(width, taps, 1);
                printf("\n");
            }
        }
    }
    return 0;
}

For example, it finds the following sequences:

2 bits:
0x3: 0011 FOUND
3 bits:
0x5: 00011101 FOUND
0x6: 00010111
4 bits:
0x9: 0000111101011001 FOUND
0xC: 0000100110101111
5 bits:
0x12: 00000101011101100011111001101001
0x14: 00000100101100111110001101110101
0x17: 00000110010011111011100010101101
0x1B: 00000110101001000101111101100111
0x1D: 00000111001101111101000100101011
0x1E: 00000101101010001110111110010011
6 bits:
0x21: 0000001111110101011001101110110100100111000101111001010001100001 FOUND
0x2D: 0000001110000100100011011001011010111011110011000101010011111101
0x30: 0000001000011000101001111010001110010010110111011001101010111111
0x33: 0000001101110011000111010111111011010001000010110010101001001111
0x36: 0000001011111100101010001100111101110101101001101100010010000111
0x39: 0000001111001001010100110100001000101101111110101110001100111011
7 bits:
0x41: 00000001111111010101001100111011101001011000110111101101011011001001000111000010111110010101110011010001001111000101000011000001 FOUND
0x44: 00000001001001101001111011100001111111000111011000101001011111010101000010110111100111001010110011000001101101011101000110010001
[...]

Looking carefully, you might notice it provides symmetrical (reversed) pairs of sequences. Also, none match the other de Bruijn sequences already mentioned, whether shifted, inverted or reversed: while LFSRs find several de Bruijn sequences, they certainly do not find all of them. They also don't find constrained sequences for 5 bits and very few above 7 bits; the ones it does find seem to always consist of two taps, at the top and bottom bits.