what is the best identifier when caching Sets ? When happens hashCode calculation of a Set?_问答_开发者

what is the best identifier when caching Sets ? When happens hashCode calculation of a Set?

开发者 https://www.devze.com 2023-03-19 01:35 出处：网络

What is the best way to implement a cache for Sets? Particularly, what makes the best key for the cache?

In a static factory method, I want to include a caching mechanism, so that I can reuse existing (immutable) objects. This reuse should not come with a significant performance penalty. The critical data of this class i开发者_JS百科s a parametrized LinkedHashSet. I'm wondering if it's wise to use the hashCode of this Set as key for the cache (HashMap), because in the java documentation it says: "The hash code of a set is defined to be the sum of the hash codes of the elements in the set". Isn't this potentially a slow process? When is it calculated? As soon as the Set is generated or on demand? Couldn't this actually eat up lots the performance gains I expect to gain by caching?

Furthermore, hashCode is an int, but HashMaps don't accept primitives, so this involes boxing to Integer, right?

My current approach, would be to maintain an additional set of the lengthes of sets of the existing objects. The factory method would first check if the current set's lenght is listed, only then looks up in the actual index. But this also involves boxing...

Is there a better solution?

You need to use some invariant as the key for each set, something that logically defines the contents of that set.

Consider creating a NamedSet either wrapping your existing set implementation with a simple Delegator, or subclassing it (if it is not final). Then you can provide an additional key or name field to identify the set and use that as the key for your cache.

Isn't this potentially a slow process? When is it calculated? As soon as the Set is generated or on demand? Couldn't this actually eat up lots the performance gains I expect to gain by caching?

In principle, this is not specified in the Set interface, thus it depends on the implementation.

For the general-purpose Set implementations in java.util and java.util.concurrent (as well as the set views of the general purpose maps), hashCode() is calculated on demand, and will be slow for bigger sets. (For small sets with simple elements, this does not really matter.)

The reason is that the hashCode (as well as equals) as defined is dynamic, e.g. changes whenever an element is added or removed, and changes also if the hashCode of an element changes (which is problematic by itself for hash-based sets). Thus, normally a Set/List/Map is not really a good key for a map.

For an immutable set (which is also in practice the only type of set which is really suitable as a map key), the hash code could be calculated once (either on creation or on the first use) and then cached (like String does it).

One could also implement such a caching for mutable sets, as long as the hash codes of the elements don't change: The formula is simple enough that it is possible to update the value on every addition or removal without inspecting anything than the added/removed element. But make sure the set does not change while it is used as a key in a map.

(Most of this also applies to List and Map with their similar hashCode() formulas.)