Bits and Bytes: Store a shuffle instruction_问答_开发者

Given a byte array with a length of two we have two possibilities for a shuffle. 01 and 10

A length of 3 would allow these shuffle options 012,021,102,120,102,201,210. Total of 2x3=6 options.

A length of 4 would have 6x4=24. Length of 5 would have 24x5=120 options, etc.

So once you have randomly picked one of these shuffle options, how do you store it? You could store 23105 to indicate how to shuffle four bytes.. But that takes 5x3=15 bits. I know it ca开发者_StackOverflow中文版n be done in 7 bits because there are only 120 possibilities.

Any ideas how to more efficiently store a shuffle instruction? It should be an algorithm that will scale in length.

Edit: See my own answer below before you post a new one. I am sure that there is good information in many of these already existing answers, but I just could not understand much of it.

If you have a well-ordering of the set of elements you are shuffling, then you can create a well-ordering for the set of all the permutations and just store a single integer representing which place in the order a permutation falls.

Example: Shuffling 1 4 5: the possibilities are:

1 4 5   [0]
1 5 4   [1]
4 1 5   [2]
4 5 1   [3]
5 1 4   [4]
5 4 1   [5]

To store the permutation 415, you would just store 2 (zero indexed).

If you have a well-ordering for the original set of elements, you can make a well-ordering for the set of permutations by iterating through the elements from least order to greatest for the leftmost element, while iterating through the leftover elements for the next place to the right and so on until you get to the rightmost element. You wouldn't need to store this array, you would just need to be able to generate the permutations in the same order again to "unpack" the stored integer.

However, attempting to generate all the permutations one by one will take a considerable amount of time beyond the smallest of sets. You can use the observation that the first (N-1)! permutations start with the 1st element, the second (N-1)! permutations start with the second element, then for each permutation that starts with a specific element, the 1st (N-2)! permutations start with the first of the leftover elements and so on and so forth. This will allow you to "pack" or "unpack" the elements in O(n), excepting the complexity of actually generating the factorials and the division and modulus of arbitrary length integers, which will be somewhat substantial.

You are right that to store just a permutation of data, and not the data itself, you will need only as many bits as ceil(log2(permutations)). For N items, the number of permutations is factorial(N) or N!, so you would need ceil(log2(factorial(N))) bits to store just the permutation of N items without also storing the items.

In whatever language you're familiar, there should be a ready way to make a big array of M bits, fill it up with a permutation, and then store it on a storage device.

A common shuffling algorithm, and one of the few unbiased ones, is the Fisher-Yates shuffle. Each iteration of the algorithm takes a random number and swaps two places based on that number. By storing a list of those random numbers, you can later reproduce the exact same permutation.

Furthermore, since the valid range for each of those numbers is known in advance, you can pack them all into a big integer by multiplying each number by the product of the lower number's valid ranges, like a kind of variable-base positional notation.

For an array of L items, why not pack the order into L*ceil(log2(L)) bits? (ceil(log2(L)) is the number of bits needed to hold L unique values). For example, here is the representation of the "unshuffled" shuffle, taking the items in order:

L=2:  0 1       (2 bits)
L=3:  00 01 10     (6 bits)
L=4:  00 01 10 11   (8 bits)
L=5:  000 001 010 011 100 (15 bits)
...
L=8:  000 001 010 011 100 101 110 111 (24 bits)
L=9:  0000 0001 0010 0011 0100 0101 0110 0111 1000 (36 bits)
...
L=16: 0000 0001 ... 1111  (64 bits)
L=128: 00000000 000000001 ... 11111111 (1024 bits)

The main advantage to this scheme compared to @user470379's answer, is that it is really easy to extract the indexes, just shift and mask. No need to regenerate the permutation table. This should be a big win for large L: (For 128 items, there are 128! = 3.8562e+215 possible permutations).

(Permutations == "possibilities"; factorial = L! = L * (L-1) * ... * 1 = exactly the way you are calculating possibilities)

This method also isn't that much larger than storing the permutation index. You can store a 128 item shuffle in 1024 bits (32 x 32-bit integers). It takes 717 bits (23 ints) to store 128!.

Between the faster decoding speed and the fact that no temporary storage is required for caclulating the permutation, storing the extra 9 ints may be well worth their cost.

Here is an implementation in Ruby that should work for arbitrary sizes. The "shuffle instruction" is contained in the array instruction. The first part calculates the shuffle using a version of the Fisher-Yates algorithm that @Theran mentioned

# Some Setup and utilities
sizeofInt = 32  # fix for your language/platform
N = 16
BitsPerIndex = Math.log2(N).ceil
IdsPerWord = sizeofInt/BitsPerIndex

# sets the n'th bitfield in array a to v
def setBitfield a,n,v
  mask = (2**BitsPerIndex)-1
  idx = n/IdsPerWord
  shift = (n-idx*IdsPerWord)*BitsPerIndex
  a[idx]&=~(mask<<shift)
  a[idx]|=(v&mask)<<shift
end

# returns the n'th bitfield in array a
def getBitfield a,n
  mask = (2**BitsPerIndex)-1
  idx = n/IdsPerWord
  shift = (n-idx*IdsPerWord)*BitsPerIndex
  return (a[idx]>>shift)&mask
end  

#create the shuffle instruction in linear time 
nwords = (N.to_f/IdsPerWord).ceil  # num words required to hold instruction
instruction = Array.new(nwords){0} # array initialized to 0

#the "inside-out" Fisher–Yates shuffle
for i in (1..N-1)
  j = rand(i+1)
  setBitfield(instruction,i,getBitfield(instruction,j))
  setBitfield(instruction,j,i)
end

#Here is a way to visualize the shuffle order
#delete ".reverse.map{|s|s.to_i(2)}" to visualize the way it's really stored
p instruction.map{|v|v.to_s(2).rjust(BitsPerIndex*IdsPerWord,'0').scan(
    Regexp.new('.'*BitsPerIndex)).reverse.map{|s|s.to_i(2)}}

Here is an example of applying the shuffle to an array of characters:

A=(0...N).map{|v|('A'.ord+v).chr}
puts A*''

#Apply the shuffle to A in linear time
for i in (0...N)
 print A[getBitfield(instruction,i)]
end
print "\n"

#example: for N=20, produces
> ABCDEFGHIJKLMNOPQRST
> MSNOLGRQCTHDEPIAJFKB

Hopefully this won't be too hard to convert to javascript, or any other language.

I am sorry if this was already covered in a previous answer,, but for the first time,, these answers are completely foreign to me. I might have mentioned that I know Java and JavaScript and that I know nothing of mathematics... So log2, permutations, factorial, well-ordering are all unknown words to me.

And on top of that I ended up (again) using StackOverflow as a white board to write out my question and answered the question in my head 20 minutes later. I was tied up in non computer life and,, knowing StackOverflow I figured it was too late to save more than 20% of everybody's easily wasted time.

Anyway, having gotten lost in all three existing answers. Here is the answer I know of

(written in Javascript but it should be easy to translate 20 lines of foreign code to your language of choice)

(see it in action here: http://jsfiddle.net/M3vHC)

Edit: Thanks to AShelly for this catch: This will fail (become highly biased) when given a key length of more than 12 assuming your ints are 32 bit (more than 19 if your ints are 64 bit)

var keyLength = 5
var possibilities = 1
for(var i = 0; i < keyLength ; i++)
    possibilities *= i+1 // Calculate the number of possibilities to create an unbiased key
var randomKey = parseInt(Math.random()*possibilities) // Your shuffle instruction. Random number with correct number of possibilities starting with zero as the first possibility
var keyArray = new Array(keyLength) // This will contain the new locations of existing indexes. [0,1,2,3,4] means no shuffle [4,3,2,1,0] means reverse order. etcetera
var remainsOfKey = randomKey // Our "working" key. This is disposible / single use.
var taken = new Array(keyLength) // Tells if an index has already been accounted for in the keyArray
for(var i = keyArray.length;i > 0;i--) { // The number of possibilities for the first item in the key array is the number of blanks in key array.
    var add = remainsOfKey % i + 1, remainsOfKey = parseInt(randomKey / i) // Grab a number at least zero and less then the number of blanks in the keyArray
    for(var j = 0; add; j++) // If we got x from the above line, make sure x is not already taken
        if(!taken[j])
            add--
    taken[keyArray[i-1] = --j] = true // Take what we have because it is right
}
alert('Based on a key length of ' + keyLength + ' and a random key of ' + randomKey + ' the new indexes are ... ' + keyArray.join(',') + ' !')