开发者

efficient hash function for uris

开发者 https://www.devze.com 2023-02-04 06:39 出处:网络
i 开发者_Python百科am looking for a hash function to build a (global) fixed size id for strings, most of them URIs.

i 开发者_Python百科am looking for a hash function to build a (global) fixed size id for strings, most of them URIs.

it should be:

  • fast
  • low chance of collision
  • ~ 64bit
  • exploiting the structure of an uri if that is possible?

would http://murmurhash.googlepages.com/ be a good choice or is there anything better suited?


Try MD4. As far as cryptography is concerned, it is "broken", but since you do not have any security concern (you want a 64-bit output size, which is too small to yield any decent security against collisions), that should not be a problem. MD4 yields a 128-bit value, which you just have to truncate to the size you wish.

Cryptographic hash functions are designed for resilience to explicit attempts at building collisions. Conceivably, one can build a faster function by relaxing that condition (it is easier to beat random collisions than a determinate attacker). There are a few such functions, e.g. MurmurHash. However it may take a quite specific setup to actually notice the speed difference. With my home PC (a 2.4 GHz Core2), I can hash about 10 millions of short strings per second with MD4, using a single CPU core (I have four cores). For MurmurHash to be faster than MD4 in a non-negligible way, it would have to be used in a context involving at least one million hash invocations per second. That does not happen very often...


I'd wait a little longer for MurmurHash3 to be finalized, then use that. The 128-bit version should give you adequate collision protection against the birthday paradox.

0

精彩评论

暂无评论...
验证码 换一张
取 消