Does anyone know of a good library (windows) that will allow me to create a static (not runtime) perfect hash for millions of items (probably about 10m)?
I essentially have millions of se开发者_StackOverflow社区ts of strings and I want to know at a minimal O(1) if a string is in my set or not - that's it. I don't need it to actually look up the string - there's no value behind it (other than the existance).
Try:
- Bob Jenkins's perfect.
- CMPH
- gperf
perfect and gperf produce tables in C code form, which should work fine on Windows. I don't know what CMPH's output is.
CMPH has a comment saying:
gperf is a bit different, since it was conceived to create very fast perfect hash functions for small sets of keys and CMPH Library was conceived to create minimal perfect hash functions for very large sets of keys.
If that's correct, then with your million-key case, you should probably prefer CMPH to gperf. I don't know how they compare to Jenkins's perfect. It should be simple enough to try all three and benchmark them against each other.
A Bloom filter will do what you want, I would look around for libraries that have them or you can attempt to write one yourself.
精彩评论