in C++, how to handle hash collision in hash map? and h开发者_运维百科ow much time will spend to search an element if a collision occurred?
And, what is a good hash function?
There are dozens of different ways to handle collisions in hash maps depending on what system you're using. Here are a few:
- If you use closed addressing, then you probably would have each item hash to a linked list of values, all of which have the same hash code, and would then traverse the list looking for the element in question.
- If you use linear probing, then following a hash collision you would start looking at adjacent buckets until you found the element you were looking for or an empty spot.
- If you use quadratic probing, then following a hash collision you would look at the elements 1, 3, 6, 10, 15, ..., n(n+1)/2, ... away from the collision point in search of an empty spot or the element in question.
- If you use cuckoo hashing, you would maintain two hash tables, then displace the element that you collided with into the other table, repeating this process until the collisions resolved or you had to rehash.
- If you use dynamic perfect hashing, you would build up a perfect hash table from all elements sharing that hash code.
The particular implementation you pick is up to you. Go with whatever is simplest. I personally find chained hashing (closed addressing) the easiest, if that suggestion helps.
As for what makes a good hash function, that's really dependent on what type of data you're storing. Hash functions for strings are often very different than hash codes for integers, for example. Depending on the security guarantees you want, you may want to pick a cryptographically secure hash like SHA-256
, or just a simple heuristic like a linear combination of the individual bits. Designing a good hash function is quite tricky, and I'd advise doing a bit of digging for advice on the particular structures you're going to be hashing before coming to a conclusion.
Hope this helps!
Generally a hash map structure stores colliding elements in either a list or a tree. If they are in a list, it costs O(1) time to insert elements, but O(N) to retrieve them (N being the number of colliding elements rather than the total in the has map). If a tree is used, insertion and lookup are both O(log N).
A good hash function is one which minimizes collisions. Which function this is depends on your particular data, but in general a hash whose outputs cannot be predicted from its inputs (one which randomly scatters items across the space) is a good choice.
精彩评论