Keep 'obvious' lock-retrieve or employ double-checked locking?_问答_开发者

I suck at formulating questions. I have the following piece of (Java) code (pseudo):

public SomeObject getObject(Identifier someIdentifier) {
    // getUniqueIdentifier retrieves a singleton instance of the identifier object,
    // to prevent two Identifiers that are equals() but not == (reference equals) in the system.
    Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
    synchronized (singletonInstance) {
        SomeObject cached = cache.get(singletonInstance);
        if (cached != null) {
            return cached;
        } else {
            SomeObject newInstance = createSomeObject(singletonInstance);
            cache.put(singletonInstance, newInstance);
            return newInstance;
        }
    }
}

Basically, it makes an identifier 'unique' (reference equals, as in ==), checks a cache, and in case of a cache miss, calls an expensive method (involving calling an external resource and parsing, etc), puts that in the cache, and returns. The synchronized Identifier, in this case, avoids two equals() but not == Identifier objects being used to call t开发者_高级运维he expensive method, which would retrieve the same resource simultaneously.

The above works. I'm just wondering, and probably micro-optimizing, would a rewrite such as the following that employs more naïve cache retrieval and double-checked locking be 'safe' (safe as in threadsafe, void of odd race conditions) and be 'more optimal' (as in a reduction of unneeded locking and threads having to wait for a lock)?

public SomeObject getObject(Identifier someIdentifier) {

    // just check the cache, reference equality is not relevant just yet.
    SomeObject cached = cache.get(someIdentifier);
    if (cached != null) {
        return cached;
    }        

    Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
    synchronized (singletonInstance) {
        // re-check the cache here, in case of a context switch in between the 
        // cache check and the opening of the synchronized block.
        SomeObject cached = cache.get(singletonInstance);
        if (cached != null) {
            return cached;
        } else {
            SomeObject newInstance = createSomeObject(singletonInstance);
            cache.put(singletonInstance, newInstance);
            return newInstance;
        }
    }
}

You could say 'Just test it' or 'Just do a micro-benchmark', but testing multi-threaded bits of code isn't my strong point, and I doubt I'd be able to simulate realistic situations or accurately fake race conditions. Plus it'd take me half a day, whereas writing a SO question only takes me a few minutes :).

You are reinventing Google-Collections/Guava's MapMaker/ComputingMap:

ConcurrentMap<Identifier, SomeObject> cache = new MapMaker().makeComputingMap(new Function<Identifier, SomeObject>() {
  public SomeObject apply(Identifier from) {
    return createSomeObject(from);
  }
};

public SomeObject getObject(Identifier someIdentifier) {
  return cache.get(someIdentifier);
}

Interning is not necessary here as the ComputingMap guarantees a single thread will only attempt to populate if absent and another thread asking for the same item will block and wait for the result. If you remove a key that is in the process of being populated then that thread and any that are currently waiting would still get that result but subsequent requests will start the population again.

If you do need interning, that library provides the excellent Interner class that has both strongly and weakly referenced caching.

synchronized takes up to 2 micro-seconds. Unless you need to cut this further you may be better off with the simplest solution.

BTW You can write

SomeObject cached = cache.get(singletonInstance);
if (cached == null) 
   cache.put(singletonInstance, cached = createSomeObject(singletonInstance));
return cached;

If "cache" is a map (which I suspect it is), then this problem is quite different than a simple double-checked locking problem.

If cache is a plain HashMap, then the problem is actually much worse; i.e. your proposed "double-checked pattern" behaves much worse than a simple reference-based double-checking. In fact, it can lead to ConcurrentModificationExceptions, getting incorrect values, or even an infinite loop.

If it is based on a plain HashMap, I would suggest using a ConcurrentHashMap as the first approach. With a ConcurrentHashMap, there is no explicit locking needed on your part.

public SomeObject getObject(Identifier someIdentifier) {
    // cache is a ConcurrentHashMap

    // just check the cache, reference equality is not relevant just yet.
    SomeObject cached = cache.get(someIdentifier);
    if (cached != null) {
        return cached;
    }        

    Identifier singletonInstance = getUniqueIdentifier(someIdentifier);
    SomeObject newInstance = createSomeObject(singletonInstance);
    SombObject old = cache.putIfAbsent(singletonInstance, newInstance);
    if (old != null) {
        newInstance = old;
    }
    return newInstance;
}