The hash structures I am aware of - HashTable, HashSet & HashMap.
Do they all use the bucket structure - ie when two hashcodes are similar exactly the s开发者_C百科ame one element does not overwrite the other, instead they are placed in the same bucket associated with that hashcode?
In Sun's current implementation of the Java library, IdentityHashMap
and the internal implementation in ThreadLocal
use probing structures.
The general problem with probing hash tables in Java is that hashCode
and equals
may be relatively expensive. Therefore you want to cache the hash value. You can't have an array that mixes references and primitives, so you'd need to do something relatively complicated. On the other hand, if you are using ==
to check matches, then you can check many references without a performance problem.
IIRC, Azul had a fast concurrent quadratic probing hash map.
A linked list is used at each bucket for dealing with hash collisions. Note that the java HashSet
is actually implemented by a HashMap
underneath (all keys being mapped to the same singleton value across all HashSet
s) and hence uses the same bucket structure.
If an element is added, its equality is checked against all items in the linked list (via .equals
) before it is added at the end. Hence having hash collisions is particularly bad, as this could be an expensive check as the linked list becomes larger.
I believe Java hash structures all use a form of chaining to deal with colisions when performing the hashing - which places the items that have the same hash into a list.
I do not believe that Java uses open addressing for it's hash based data structures (open addressing recomputes hashes based on retry sequences until it finds an open slit in the table)
No -- open addressing is an alternate method of representing hash tables, where objects are stored directly in the table, instead of residing in a linked list. Only one object can be stored at a given index, so resolving collisions is more complicated.
When adding an object for which another object already resides at the same index, a probing sequence is used to determine the new index at which to store the new object. Removing objects is also more complicated, since you if you remove an object, you need to leave a marker that says "there used to be an object here"; for more details, see Wikipedia.
Open addressing is preferable when the objects being stored as small and will rarely be deleted. Open addressing has improved cache performance, since you don't need to go through an extra level of indirection walking a linked list.
The classes you mentioned -- HashTable
, HashSet
, and HashMap
don't use open addressing, but you could easily create new classes that implemented open addressing and provided the same APIs as those classes.
The apis define the behaviour, the internals of how Hash collisions are managed doesn't affect the guarantees of the API ... the performance impact of bad hash value computation is another story. Let's just hash everything to 42 and see how it behaves.
Maps and Sets are the interfaces that determine the behavior of a HashSet or HashMap. A HashSet is a Set, and so it behaves like a Set (ie duplicates are not allowed). A HashMap acts like a Map - it will not overwrite a key with a similar hashcode, but it will overwrite a key, if the same exact key is used again. This will be the same regardless of what data structure is backing the Map internally. See the javadoc for Sets and HashMaps for more.
Did you mean to ask something about the specific implementation of one of these structures?
Except the HashSet. Set is by definition unique elements.
This was a mistake. Please see the comments below.
精彩评论