开发者

Java: Getting a unique hash value of an object

开发者 https://www.devze.com 2023-02-10 06:49 出处:网络
I am trying to get a unique hash value for a Java object, such as the following are true: If A == B then A.HashValue() == B.Hash.HashValue()开发者_Go百科

I am trying to get a unique hash value for a Java object, such as the following are true:

  1. If A == B then A.HashValue() == B.Hash.HashValue()开发者_Go百科
  2. If A != B then A.HashValue() != B.HashValue()

Lets say the object contains several boolean and integer fields.


// Very important edit...

Gjorgji, I know you accepted the answer below as correct, but I have found it to be incorrect.

If you have a class like this:

class tiny {
    int a;
    public int hashCode() { return a; }
}

You have already maxed out all possible hash codes. (If this isn't clear why, please say so.)

So, if you add ANY more information to the object, if you want that information represented in the hashCode, you're going to have a collision somewhere.

But, for that matter, you don't really want to set out to get a hashCode that's 100% unique to an object. That's really not the point of hashCode!

The point of hashCode is to give you a "unique enough" identifier to the object so you can place it in a hash bucket. It's not for identification so much as it is for classification. The idea is if you have a whole bunch of objects, you're probably not going to have many collisions, so you are probably going to have pretty fast access to what you're looking for if you grouped items by their hashCode.

If this means you deselect my answer as correct, that's okay. It really isn't correct for what you're looking for. My hope is that you realize this explanation of hashCode leads you to the correct usage, thereby keeping the correctness. But as Mark clearly pointed out, this doesn't actually solve the issue you stated.

Below is the old answer:

===========================================================

A good article on it is found here, from Effective Java (hands down the best "I want to learn how to be a good Java developer" book out there.)

http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ313_029.htm

class Gjorgji {
    boolean a;
    boolean b;
    boolean c;
    int x;
    int y;

    // EDIT: I almost forgot a VERY important rule...
    // WHEN YOU OVERRIDE hashCode, OVERRIDE EQUALS (and vice versa)
    public int equals(Object o) {
        if(!(o instanceof Gjorgji) return false;
        Gjorgji g = (Gjorgji)o;
        return a == g.a && b == g.b && c == g.c && x == g.x && y == g.y;

    }

    public int hashCode() {
        int hash = x ^ y;
        hash *= a ? 31 : 17; // pick some small primes
        hash *= b ? 13 : 19;
        hash *= c ? 11 : 29;
        return hash;
    }

}


This is not possible in general, you must guarantee that if a.equals(b), then a.hashCode() == b.hashCode(). You cannot guarantee the reverse: you can always have collisions because the hashCode method only has a 32bit space and your JVM can have a 64bit space for identity hashcodes.


I am trying to get a unique hash value for a Java object...Lets say the object contains several boolean and integer fields.

To go this you need longer than a 32-bit integer then, or you need to define constraints on the range of your fields. It's simply impossible to stuff more than 32 bits of information into 32 bits, and having an int and a boolean alone is 33 bits of info (assuming every value of the int is possible).

A long won't even be big enough if you have more than one int field. You'd need to go into a BigInteger, BitSet or a byte array.

Anyway, say your data doesn't span more than 32 bits. Then it's just a matter of arranging your data into a bit field represented by an int.

byte a;
byte b;
boolean c;
boolean d;

int hash = (a << 24) | (b << 16) | (c ? 0x02 : 0) | (d ? 0x01 : 0);


//layout
//index:                         ... 3210                             
//aaaa aaaa bbbb bbbb 0000 0000 0000 00cd

This doesn't make it a well-distributed hash (for using in a hash table, e.g.). However, if you want to guarantee uniqueness you probably aren't trying to use it for a hash table are you?

I'm curious why you have this bizarre requirement. The normal purpose of a hash is to get a value that is likely to be unique but of fixed (reduced) size. Your requirement guarantees that the hash needs to be as wide as the data it represents.


You can do this if you can limit the number of instances of your class to under 232. Here's one way:

import java.util.concurrent.atomic.AtomicInteger;

class UniqueHash {
    private static AtomicInteger NEXT_HASH_CODE = new AtomicInteger();
    private final int hashCode;

    UniqueHash() {
        while (true) {
            int nextHashCode = NEXT_HASH_CODE.get();
            if (nextHashCode == -1) {
                throw new RuntimeException("Too many instances!");
            }
            if (NEXT_HASH_CODE.compareAndSet(nextHashCode, nextHashCode + 1)) {
                hashCode = nextHashCode;
                break;
            }
        }
    }

    public int hashCode() {
        return hashCode;
    }
}

Edit 1: this was assuming that by "a == b" you meant a == b in the sense of object identity. You mention in the comments that you actually mean if the fields are equal. See the responses by @Mark Peters and @sjr.

Edit 2: fixed bug pointed out by @Tom Hawtin - tackline, left other bad practice in place. :)

Edit 3: there was a race in my "fix". Fixed the race.


Use System.identityHashCode()

http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#identityHashCode(java.lang.Object)

Edit: it's true that you can't guarantee uniqueness of hash codes with this method; however, I think it's the best you can do given that you can't get an Object's memory location. Any other hash function you come up with will necessarily have the property that two structurally equivalent Objects hash to the same value, whereas this function at least gives you a chance of all of the Objects your program creates having different hashcodes.

For completeness: an Object's default hash code is computed once, when the Object is constructed, from its initial memory location. So if more than one Object is created with the same initial memory location they will necessarily have the same hash code.


How to get a "unique ID" -- I do not really recommend this :-) However, it does fulfill the requirements in the question. See IdentityHashMap and consider weak references.

  1. Use a Map object -> integer where the integer represents a counter.
  2. For each new object seen, increment the counter and add it to the Map.
  3. For each existing object, return the value stored.

There may also be implementation-specific methods: for instance, on Sun, I believe Object.toString (base method) always returns a unique string per that objects lifetime. The "encoded number" could be siphoned out and is the "internal reference" AFAIK.

I make no guarantees about accuracy of the preceding paragraph. YMMV. Happy coding.

0

精彩评论

暂无评论...
验证码 换一张
取 消