开发者

How to check whether my custom hashing is good in hash_map?

开发者 https://www.devze.com 2022-12-22 07:29 出处:网络
I\'ve written a custom hashing for my custom key in stdext::hash_map and would like to check whether the hasher is good. I\'m using STL supplied with VS 2008. A typical check, as I know, is to check t

I've written a custom hashing for my custom key in stdext::hash_map and would like to check whether the hasher is good. I'm using STL supplied with VS 2008. A typical check, as I know, is to check the uniformity of distribution among buckets.

How should I or开发者_如何学Goganize such a check correctly? A solution that comes to my mind is to modify STL sources to add a method to hash_map that walks through buckets and does the subject. Is there are any better ways?

Maybe, derive from hash_map and create there such method?


Your best bet might be to just take your hashing algorithm to an array of ints and count the number of times that each hash bucket is hit, given real-world data. (I'm suggesting taking the STL out of the equation here, really.)

If you end up seeing high deviation in your counts with large sets of real-world data, your hashing algorithm is generating lots of collisions when there are plenty of empty (or emptier) buckets available.

Note that 'high deviation' is a relative term. A good hash algorithm is a deterministic random process and any random process has a chance of generating strange results, so test often, test well, and wherever possible, use your actual problem domain as a source of your tests and your controls.


I'd run one (large) dataset through stl::hash_map. Once done, I'd collect the results for all buckets using the following method

From hash_map:

size_type elems_in_bucket (size_type __n) const;

Finally, I would do compute the standard deviation (SD) of the elem-to-bucket distribution.

I'd do the above for different hash functions. Whichever hash function results in minimum SD is the winner (for this dataset).

0

精彩评论

暂无评论...
验证码 换一张
取 消