开发者

Why if hashCode method is implemented, equals methods must also be implemented in case of keys in Dictionary the datatype?

开发者 https://www.devze.com 2022-12-09 20:58 出处:网络
Datat开发者_JS百科ype: Dictionary keys Can somebody please tell me importance of implementing both(hashCode/equals) of them simultaneously. because I think if we implement hashCode method equals is g

Datat开发者_JS百科ype: Dictionary keys

Can somebody please tell me importance of implementing both(hashCode/equals) of them simultaneously. because I think if we implement hashCode method equals is going to compare hashCodes and give us the equality.


A HashCode does not guarantee uniqueness. As an example, a HashCode takes 2^32 values in most languages. If you have a class of 4 ints, how many possible unique states/instances of that class could you have? (2^32)^4. Meaning even if you implement a perfect hashcode, you would still have 2^(32*3) collisions, where a pair of different objects have the same hashcode.

The HashCode is therefore used as a first 'fast' comparison, to find objects that look similar to what you are looking for. Once you get down to a group of objects, equality is checked on each to see if there is one which is precisely what you are looking for.


Just because hash codes are equal does not mean that the underlying objects are equal. There are a limited number of possible hash codes, so there are bound to be collisions. You should implement a robust .Equals() so that you can actually test for equality.


The problem is that just because two objects have the same hashcode doesn't mean that they're equal.

There are only 2^32 possible hash codes (32-bit integers). If you think about it, you'll realize that the number of possible strings is much much much larger. Therefore, not every string will have a unique hashcode.

Also, many classes' GetHashCode methods are poorly implemented.

For example, here is Point.GetHashCode from the .Net source code:

public override int GetHashCode() { 
    return x ^ y; 
}

Notice that (2, 3) will have the same hashcode as (3, 2), even though they're not equal. Although there are implementations that do not exhibit this behavior, they're still, by definition, not unique.


IMHO, the reason for implementing both hashcode and equals is this:

A hashtable allows fast access to elements based on keys. This is possible because of its implementation.

A hashtable internally uses buckets to store its values. Think of each bucket as an array. And there is an array of such buckets. Therefore it becomes a double dimensional array. The hashcode of the key is a mechanism by which the hashtable can directly jump to the index of the bucket in which the value is stored.

Eg:

Below I have written the code for a Class that I will use as the key for a HashMap instance.

package com.aneesh.hashtable;  
import java.util.HashMap;  
public class Key {

private String key;

public Key(String key){
    this.key = key;
}


@Override
public int hashCode() {
    return key.hashCode();
}


@Override
public boolean equals(Object obj) {
    if (this == obj)
        return true;
    if (obj == null)
        return false;
    if (getClass() != obj.getClass())
        return false;
    Key other = (Key) obj;
    if (key == null) {
        if (other.key != null)
            return false;
    } else if (!key.equals(other.key))
        return false;
    return true;
}


public static void main(String[] args) {

    HashMap<Key, String> hashMap = new HashMap<Key, String>();
    hashMap.put(new Key("a"), "java");
    hashMap.put(new Key("k"), "Python");

    System.out.println(hashMap.get(new Key("a")));
    System.out.println(hashMap.get(new Key("k")));

}
}

The hashCode implementation of the Key class is to simply return the hashCode of the instance variable 'key' that is of type String. The hashcode for "a" = 97 The hashcode for "k" = 107 //there is a reason why I choose these two keys which will become obvious soon.

When you do hashMap.put(new Key("a"), "java"); the hashtable has to figure out which bucket it has to put the key,value into. The code for that would be

int indexofBucket = key.hashCode() % numberOfBuckets //7, where key is "a"

So the key,value pair of ("a,"java") will be stored as the first element in the 7th bucket.

When you do hashMap.put(new Key("k"), "python"); the bucket index again gets calculated as indexofBucket = key.hashCode() % numberOfBuckets //7, where key = "k"

This is the same bucket, the bucket at the 7th index.

Now, when you retrieve a value by its key

hashMap.get(new Key("a"));

the hashtable would figure out the index thus:

indexOfBucket = key.hashCode() % numberOfBuckets //7

At this point the hashtable would find two elements in the bucket. Now which element is the one it has to return would be decided by (in a simple implementation I guess) iterating over each element and comparing the equals of the keys. Without the equals the hashtable might not even be able to find the element that you have added into it.

To see this in action, comment out the equals implementation of the Key class and run the code. You will see

null  
null 

being printed as output, whereas with the equals implemented, you will see an output of

"java",  
"python"

Long wounded explanation, but hope that helps

0

精彩评论

暂无评论...
验证码 换一张
取 消