The objective is to build very large trees. By very large I mean hun开发者_StackOverflow中文版dreds of millions of nodes, fitting in a few gigabytes.
The issue is that common data structures have way too much overhead. I cannot afford to have "node" objects and children "maps". I need to directly encode it into memory in a very compact way.
Therefore, I was wondering if there existed some memory efficient implementation of trees having integers as key and values, without using objects internally, therefore needing (4 byte for key + 4 bytes for value + 4 bytes for children index + a few bytes for free hashing space = 15 bytes per entry on average) which would allow me to use an external mapping int<->keys and int<->values to search the tree.
Anyone?
PS: Using objects internally uses at least 5 times more space: 8 reference + 4 extra hash space + 16 object header + 8 key ref + 8 value ref + 8 parent ref + 8 children ref + (16 + x) for children map obj = nearly 76+x bytes per entry. (for instance, our default implementation needed around 100 bytes per entry)
That's actually not a Java specific question but more of a general concept.
Try this: http://webdocs.cs.ualberta.ca/~holte/T26/tree-as-array.html
The key would be to use arrays of primitives, in order to avoid object overhead.
I don't know of any specific tree implementation that does exactly that, but VTD-XML represents an XML tree (the DOM) internally using an array of tokens with pointers into a buffer. Perhaps you can get inspired by their solution?
You may find that this library helps you achieve what you want - it is specifically designed for storing values as primitives and does some bytecode hackery behind the scenes to give the illusion of storing Objects. Use it when...
... you want to efficiently store large collections of data in memory. This library can dramatically reduce Full GC times and reduce memory consumption as well.
It doesn't have a specific Tree collection, but it might do the trick, depending on what you need.
http://code.google.com/p/vanilla-java/wiki/HugeCollections
I don't think you'll find anything already implemented for you, but what you've described could be very easily implemented using an IntBuffer. You'd create a "wrapper" class that converts indexes into records in the buffer, and instantiate/discard those wrappers as needed (ie, as you're traversing a tree, you probably don't care about holding a reference to the parent).
There are a couple of concerns:
- Garbage collection of the wrapper instances: as long as they're short-lived, they never get out of Eden, so GC is almost free.
- Garbage collection of records within the buffer: if you have a write-once tree, this is no problem. Otherwise, you'll need to maintain a free list. Not difficult, but it takes some time.
- General mechanics of implementing a tree: this is already done for you with classes like
TreeMap
. But the algorithms are pretty simple, and available from Wikipedia.
Instead of keeping a list of children, every node could have a reference to its parent. Thus serializing a node wouldn't need more than three integer values (parent, key, value).
A problem with this approach is tree traversal. Obtaining a definite list of all node children would require iterating through all nodes. If the tripplettes are sorted by their parent values traversal may be improved. Adding one more integer value, i.e. a pointer to the next key would allow keeping the nodes in a linked list easing the job of insertion and removal of nodes.
精彩评论