About secondary clustering in hash tables_问答_开发者

About secondary clustering in hash tables

开发者 https://www.devze.com 2023-04-04 15:35 出处：网络

Although quadratic probing eliminates primary clustering, elements that hash to the same position will probe the same alternate cells.

相关专题：algorithm

Although quadratic probing eliminates primary clustering, elements that hash to the same position will probe the same alternate cells. This is known as secondary clustering. Simulation results suggest that it generally causes less than an extra probe per search.

Above开发者_Python百科 is text snippet from algorihms book from Mark Allen Wessis book.

My question can some one explain with example what is secondary clustring and what does author mean by "Simulation results suggest that it generally causes less than an extra probe per search".

Thanks!

Secondary clustering is defined in the piece of text you quoted: instead of near the insertion point, probes will cluster around other points.

"Simulation results suggest that it generally causes less than an extra probe per search" means someone tried to insert or find lots of data in a hash table with quadratic probing, and found that, on average, less than two probes were needed to find the right spot in the hash table. (One probe is of course the minimum needed to insert or find anything in a hash table.)

The terms primary and secondary clustering are probably fairly standard, because they are in Knuth Vol 3 section 6.4 as well. Here he considers a hash function on a key to gain a first hash value h(K) and then various ways of working out where to go if the slot in the table suggested by h(K) is full. The problem is - what happens when the table is near enough full that some sections of the table are getting full up.

If the alternate slots depend on a completely separate function g(K) then you get reasonably good behaviour, with no clustering, but you have the cost of computing g(K).

If the alternate slots are h(K) + i, for some i, then contiguous sections of the hash table that happen to get a lot of hits form contiguous areas of completely full up slots. In fact, nearby areas will spread out and collide with each other. Knuth describes this as primary clustering.

If the alternate slots are something like h(K) + i^2 then you don't get nearby areas colliding to form large all-full contiguous sections (primary clustering) but you don't get as nice a behaviour as you do with two completely separate hash functions, because in this case if h(K1) = h(K2), then the set of possible slots for both keys are completely identical, so they will collide with each other to some extent.

PS if you are interested in the details of hashing, you might like to know that there is more recent work as well - http://en.wikipedia.org/wiki/Cuckoo_hashing