开发者

HashMap collisions: is my code correct?

开发者 https://www.devze.com 2023-01-25 00:26 出处:网络
I want to have one DateWrapper - representing a date (built for Hibernate persistance, but this is another story) - at most existing at the same time for the same date.

I want to have one DateWrapper - representing a date (built for Hibernate persistance, but this is another story) - at most existing at the same time for the same date.

I'm a bit confused about collisions and good keys for hashing. I'm writing a factory for a DateWrapper object, and I thought to use the milliseconds of the parsed date as the key as I've seen others doing. But, what happens if there is a collision?. Milliseconds are always different from one another, but the internal table may be smaller than the Long that could exist. And once the hash map has a collision, it uses the equals, but how can it distinguish two different object from my Long? Maybe, it's the put method to drop (overwrite) some value I'd like to insert... So, is this code safe, or is it bugged??

package myproject.test;

import java.util.HashMap;
import java.util.Map;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import myproject.utilities.DateWrapper;

public class DateWrapperFactory {

    static Map <Long, DateWrapper> cache = new HashMap<Long, Dat开发者_如何学运维eWrapper>();
    static DateTimeFormatter parser =
        DateTimeFormat.forPattern("yyyy-MM-dd");

    static DateWrapperFactory instance = new DateWrapperFactory();

    private DateWrapperFactory() {
    }

    public static DateWrapperFactory getInstance() {
        return instance;
    }


    public static DateWrapper get(String source) {
        DateTime d = parser.parseDateTime(source);
        DateWrapper dw = cache.get(d.getMillis());
        if (dw != null) {
            return dw;
        } else {
            dw = new DateWrapper(d);
            cache.put(d.getMillis(), dw);
            return dw;
        }
    }

}

package myproject.test;

import org.joda.time.DateTime;

public class DateWrapper {

    private DateTime date;

    public DateWrapper(DateTime dt) {
        this.date = dt;
    }

}


If you are using the actual objects you want as map keys and letting the HashMap take care of the details of what it does with the hashcode of those objects (and the keys implement equals and hashCode according to their contract) there will be no issue if there's a hashcode collision other than some possible performance reduction due to the need to search linearly through every entry that hashed to the same bucket.

The issue in your other question where the subject of collisions came up was that rather than using the actual object that should have been the key, you were using the hashcode of that object as the key itself. This was incorrect and would have led to incorrect behavior.... when you went to look up the value for a given key in the map, the result could have been the value that actually maps to a completely different key that just happens to have the same hashcode.

The moral of the story is: use the actual key or something that is definitely equivalent (like the millis of the DateTime in this case) as the key, not the key's hashcode. The HashMap does what it needs with the hashcode for you.


The equals() will be called on the Long keys, not the values. You are fine.


With HashMap, you can store only one entry under any given key value (e.g. Long in your case).

On a side note, if there is any chance of concurrency, you may want to use a ConcurrentHashMap and putIfAbsent() instead of non-atomic get/if/put calls.


Given what you're ultimately trying to accomplish with this, that doesn't seem too terribly productive. You have a highly optimized data structure specifically designed for fast searching and enforcing uniqueness, called a Database Index. And extremely robust in-memory and L2 caching already in place for you from Hibernate. Which incidentally does not have the thread safety issues of putting a HashMap on a static field.

Why not make that number the value of the ID column in your database and let the robust platform technologies take care of finding it quickly and caching it for you? An in-memory L2 cache hit is really not that much slower than a HashMap in the grand scheme of things. It would be a pretty rare application where the difference is one of your meaningful hotspots.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号