I have check out this开发者_运维问答 Wikipedia page on it, but I still don't understand it. Can someone please help my dim-witted mind to understand the concepts of hashing, hashtable/hashmap, and hash functions? Some examples would really help.
The Wikipedia article will have a lot of technical information, but a simplistic view of hashing is something like the following.
Imagine that there's a magical function that can give a number to any object. Given the same object, it always return the same number.
Immediately now you have a quick way to test if two objects are the same: ask this function for their numbers and compare. If they're different, then they're not the same.
But what if they have the same number? Could two different objects have the same number?
Yes, this is possible in most scenario. Let's say that the function can only give numbers between 1..10, for example, and there are 100 different objects. Then of course some different objects must have the same number. This is what is called a "collision". A "collision" makes our quick equality test not as useful, so as much as possible we want to minimize its happening. A good magical function is one that would try to minimize the number of "collision".
So what else can you do with this number? Well, you can use it to index an array. Given an object, you can put it at the index given by the number from this magical function. This array is essentially what a hashtable is; this magical function is a hash function.
A hash function is a way to create a compact representation of an arbitrarily large amount of data. In java with the hashcode method this means somehow describing the state of your object (no matter how large) in an int (4 bytes). And is usually written to be a fairly fast as explained below.
To simplify in hashtables/hashmaps the hashcode serves as sort of a cheap equals. Take two objects a and b of type Foo lets says to figure out if a.equals(b) takes 500 ms where as calculating a (efficient) hashcode only take 10ms. So if we want to know if a.equals(b) instead of doing that directly first we will look at the hashcodes and ask does a.hashCode() == b.hashCode(). Note that this will take only 20ms in our example.
Because of the API definition of hashcode we know that if the hashcode of a is not equal to b then a.equals(b) should never be true. So in our above test if we see the hashcodes are unequal then we never need to do the longer .equals() test, this is why you should always override hashCode and equals together.
You may also see references about writing "good" or "well distributed" hashcodes. This has to do with the fact that the inverse of the previous statements about hashcode and equals is not true. More specifically a.hashCode() == b.hashCode() does not necessarily imply a.equals(b) So the idea of a good hashcode is you reduce the likelyhood of a.hashCode() == b.hashCode() when a.equals(b) is false. You may have seen this referred to as a collision of a hash function.
Back to hashmaps/tables. These are based on key/value pairs. So when you add or retrieve a value you will supply a key. So the first thing the map has to do is look for the key, which means finding something that .equals() the key you provide. But as we discussed above .equals() can be incredibly slow which means comparisons can be greatly sped up by checking hashcodes first. Since when the hashcodes are well distributed you should know quickly when x is definitely != y.
Now in addition to the comparison hashmaps/tables actually use the hashcodes to organize their internal storage of the data, however I think that is beyond the scope of what you are looking to understand at this point.
HASH FUNCTION:- A hash function takes a group of characters (called a key) and maps it to a value of a certain length (called a hash value or hash). The hash value is representative of the original string of characters, but is normally smaller than the original. Hashing is done for indexing and locating items in databases because it is easier to find the shorter hash value than the longer string. Hashing is also used in encryption.This term is also known as a hashing algorithm or message digest function.
HASH MAP:- HashMap is a collection class that is designed to store elements as key-value pairs. Maps provide a way of looking up one thing based on the value of another.
A lookup table that is designed to efficiently store non-contiguous keys (account numbers, part numbers, etc.) that may have wide gaps in their alphabetic or numeric sequences.
HASH TABLE:- Hash tables are created with an algorithm that stores the keys into hash buckets, which contain key-value pairs. Since different keys may hash to the same bucket, the goal of hash table design is to spread out the key-value pairs evenly with each bucket containing as few key-value pairs as possible. When an item is looked up, its key is hashed to find the appropriate bucket, and the bucket is then compared to find the right key-value pair.
This book (and supporting video lectures) provide some great explanation of algorithms and data structures. There are some lectures about hash functions (1, 2). I'd recommend that.
(source: mit.edu)
Also, just FYI, Not really true, as pointed by polygenelubricants in comments.hashCode()
, called on an instance of Object
class returns an address of this particular instance in memory.
A hashtable is basically a way to store anything in an array and retrieve it almost as fast as looking up something in an array via an index, without wasting too much space.
The job of a hash function is (in this context) to compute the array index at which an object will be stored, based on the contents of the object. That means, it must always return the same result for the same object, and should return different results for different objects as much as possible. When two different objects have the same hash, it's called a "collision", and you have to treat those cases specially, which makes the whole thing slower.
The mapping of keys to indices of a hash table is called hash function. Hash function contains two parts
Hash Code Map : It converts keys to integer of any range.
Compression Map : It converts(brings) these integers to the range of keys hashtable has.
Taken from http://coder2design.com/hashing/
Hash function: If you pass the same object to this function any number of times, be it text, binary or number, you always get the same output. For the hash table purposes an integer returning hash function is used.
Above functionality is calling hashing.
Hash table: Miracle data structure of computer science that returns search result in constant time or O(1). It is based on the above concept of hashing. So, it has better access time than linkedlist, binary search trees etc.
Why nearly O(1): It uses an array as its base structure internally to store the objects and as arrays have constant access time hence, the Hash table does too.
[Basic internal]: So, it uses an array of fixed size internally and when you insert an (Key, Value) pair, it calculates key's hash and uses this hash value as index to store the (Key,Value) pair in the array. Next, when you search for the object using the same key, it uses the hash of the key again as index to search for the key in the array. Now, two objects can have same hash value and hence, while inserting these objects in the hash table there will be collision. There are two ways for collision resolution. You can refer this link for a sufficiently detailed discussion on this topic.
HashCode()
function, which returns an integer value, is used by HashMap
to find a correct bucket.
精彩评论