开发者

is it incorrect to define an hashcode of an object as the sum, multiplication, whatever, of all class variables hashcodes?

开发者 https://www.devze.com 2022-12-29 02:02 出处:网络
Let\'s say I have the following class: class ABC { private int myInt = 1; private double myDouble = 2; private String myString = \"123\";

Let's say I have the following class:

class ABC {
    private int myInt = 1;
    private double myDouble = 2;
    private String myString = "123";
    private SomeRandomClass1 myRandomClass1 = new ...
    private SomeRandomClass2 myRandomClass2 = new ...

    //pseudo code
    public int myHashCode() {
        return 37 *
               myInt.hashcode() *
               myDouble.hashCode() *
      开发者_Python百科         ... *
               myRandomClass.hashcode()
    }
}

Would this be a correct implementation of hashCode? This is not how I usually do it(I tend to follow Effective Java's guide-lines) but I always have the temptation to just do something like the above code.

Thanks


It depends what you mean by "correct". Assuming that you're using the hashCode() of all the relevant equals()-defining fields, then yes, it's "correct". However, such formulas probably will not have a good distribution, and therefore would likely cause more collisions than otherwise, which will have a detrimental effect on performance.

Here's a quote from Effective Java 2nd Edition, Item 9: Always override hashCode when you override equals

While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and computer scientists. [...Nonetheless,] the techniques described in this item should be adequate for most applications.

It may not require a lot of mathematical power to evaluate how good your proposed hash function is, but why even bother? Why not just follow something that has been anecdotally proven to be adequate in practice?

Josh Bloch's recipe

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;

Now, of course that recipe is rather complicated, but luckily, you don't have to reimplement it every time, thanks to java.util.Arrays.hashCode(Object[]) (and com.google.common.base.Objects provides a convenient vararg variant).

@Override public int hashCode() {
    return Arrays.hashCode(new Object[] {
           myInt,    //auto-boxed
           myDouble, //auto-boxed
           myRandomClass,
    });
}

See also

  • Object.hashCode()

    It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.


Doing this sort of things is allowable by the contract. But so is always returning 1. There is a compile time flag in HotSpot to always return 1 for the identity hash vale. Such choices will, however, produce poor performance.

There is a particular problem with multiplication. Not only will a 0 hash value from a component annihilate the value, but powers of two will progressively zero the lower bits.

Commutative operators have the problem that rearrangements of values will cause a clash.

If there is a particular relationship between hash values of components, then addition will be particularly bad. (4, 6) and (2, 8) clashing, for instance.


No, but in practice it is almost certainly not a good idea. Most importantly you are not allowed to modify any of the fields that you use in the hash code. They all have to be constant.

If you modify one, this can happen: You insert the objecy in a HashSet, you change a fields, and then test if the object is in the HashSet. Although it is in there, due to the fact that the hash code changed, HashSet will not find it!


I'm probably much too late to this party, but.....

The problem with your hash function can be described by the commutative law of multiplication. - khan academy link

That is - a * b * c == c * b * a.

So by only multiplying you remove the uniqueness of the order. This means you are much more likely to get collisions.


Seems to me that unless you can guarantee that the product is a prime number you might get collision (though probably rare) between resulting hashcodes for an object

0

精彩评论

暂无评论...
验证码 换一张
取 消