I'm looking for a PRNG (pseudo randomness) that you initially seed with an arbitrary arra开发者_如何学运维y of bytes.
Heard of any?
Hashing your arbitrary length seed (instead of using XOR as paxdiablo suggested) will ensure that collisions are extremely unlikely, i.e. equal to the probability of a hash collision, with something such as SHA1/2 this is a practical impossibility.
You can then use your hashed seed as the input to a decent PRNG such as my favourite, the Mersenne Twister.
UPDATE
The Mersenne Twister implementation available here already seems to accept an arbitrary length key: http://code.msdn.microsoft.com/MersenneTwister/Release/ProjectReleases.aspx?ReleaseId=529
UPDATE 2
For an analysis of just how unlikely a SHA2 collision is see how hard someone would have to work to find one, quoting http://en.wikipedia.org/wiki/SHA_hash_functions#SHA-2 :
There are two meet-in-the-middle preimage attacks against SHA-2 with a reduced number of rounds. The first one attacks 41-round SHA-256 out of 64 rounds with time complexity of 2^253.5 and space complexity of 2^16, and 46-round SHA-512 out of 80 rounds with time 2^511.5 and space 2^3. The second one attacks 42-round SHA-256 with time complexity of 2^251.7 and space complexity of 2^12, and 42-round SHA-512 with time 2^502 and space 2^22.
Why don't you just XOR your arbitrary sequence into a type of the right length (padding it with part of itself if necessary)? For example, if you want the seed "paxdiablo" and your PRNG has a four-byte seed:
paxd 0x70617864
iabl 0x6961626c
opax 0x6f706178
----------
0x76707b70 or 0x707b7076 (Intel-endian).
I know that seed looks artificial (and it is since the key is chosen from alpha characters). If you really wanted to make it disparate where the phrase is likely to come from a similar range, XOR it again with a differentiator like 0xdeadbeef
or 0xa55a1248
:
paxd 0x70617864 0x70617864
iabl 0x6961626c 0x6961626c
opax 0x6f706178 0x6f706178
0xdeadbeef 0xa55a1248
---------- ----------
0xa8ddc59f 0xd32a6938
I prefer the second one since it will more readily move similar bytes into disparate ranges (the upper bits of the bytes in the differentiator are disparate).
精彩评论