开发者

Is there a kind of map that optimize for *sequences of keys* that have the same value?

开发者 https://www.devze.com 2023-03-18 07:30 出处:网络
If you are mapping Java shorts to a few immutable objects, and it is often the case that a consecutive sequence of short keys (neighbors) map to the same value,it there some map structure that allows

If you are mapping Java shorts to a few immutable objects, and it is often the case that a consecutive sequence of short keys (neighbors) map to the same value, it there some map structure that allows you to save more memory then a hashmap, while keeping a fast access speed (O(1) or O(log(N)))?

I could inverse the map, and I would use much less memory, but then I would have to go through every mapping to know if a specific short is mapped, and to what it is mapped (O(N)).

I suppose some kind of treemap could do that; maybe there is something like开发者_开发问答 that in some collection library?


Have a look at interval trees.


I once used a TreeMap with a custom key class and corresponding comparator to implement this. My key class contained both ends of a range of double values. Queries were specified as a range with both ends being the same and the comparator did the rest.

There were a few choices to be made, though:

  • How should remove() be handled?

  • What should happen if a get() is issued with a key range that overlaps two or more ranges?

  • Would it make sense to bundle this behaviour in a new Map implementation - possibly a subclass of TreeMap?


You can use a binary tree with one entry for each interval of shorts that map to the same value. The key would be the start of the interval, while the data is the length of the interval plus the mapped objects.

Thus to find if given short is mapped you need to locate the node in the tree, with the highest key less than the given one (O(logn)) and check whether the given one falls within the interval this node represents.


This solution is pretty different - very old-fashioned, but approaching O(1), small and fast. 90% of the values will fit into 4 bits, whereas a map or tree entry takes hundreds of bits to represent (without a lot of custom reimplementation). So start by representing them in an array of 4-bit entries:

// Used to store nybbles containing small values, with direct arithmetic mapping.
// A value of 15 indicates that the value is larger than 14.
// Size: 32KB
byte[] zeroTo14Array = new byte[(1<<Short.SIZE)/2];
static final short BIGGER_THAN_NYBBLE = 15;

Then use an efficient short-to-byte map (from fastutil or gnu trove to represent the values from 15 to 255:

// Use to store bytes with values 15-255.
// If value is 0, value is larger than 255.
Short2ByteOpenHashMap byteMap = new Short2ByteOpenHashMap();

Finally, use an efficient short-to-object map for everything else:

// Use to store values larger than 255 
Short2ObjectOpenHashMap<Value> objectMap = new Short2ObjectOpenHashMap();

// just a sketch
public class Value
{
    short shortValue;
    String optional;
}

I can post the rest of the untested code, if you'd like.

0

精彩评论

暂无评论...
验证码 换一张
取 消