I am looking for a way to store several gb's of data in memory. The data is loaded into a tree structure. I want to be able to access this data through my main function, but I'm not interested in reloading the data into the tree every time I run the progra开发者_JAVA技巧m. What is the best way to do this? Should I create a separate program for loading the data and then call it from the main function, or are there better alternatives?
thanks Mads
I'd say the best alternative would be using a database - which would be then your "separate program for loading the data".
If you are using a POSIX compliant system, then take a look into mmap.
I think Windows has another function to memory map a file.
You could probably solve this using shared memory, to have one process that it long-lived build the tree and expose the address for it, and then other processes that start up can get hold of that same memory for querying. Note that you will need to make sure the tree is up to being read by multiple simultaneous processes, in that case. If the reads are really just pure reads, then that should be easy enough.
You should look into a technique called a Memory mapped file.
I think the best solution is to configure a cache server and put data there.
Look into Ehcache:
Ehcache is an open source, standards-based cache used to boost performance, offload the database and simplify scalability. Ehcache is robust, proven and full-featured and this has made it the most widely-used Java-based cache.
It's written in Java, but should support any language you choose:
The Cache Server has two apis: RESTful resource oriented, and SOAP. Both support clients in any programming language.
You must be running a 64 bit system to use more than 4 GB's of memory. If you build the tree and set it as a global variable, you can access the tree and data from any function in the program. I suggest you perhaps try an alternative method that requires less memory consumption. If you post what type of program, and what type of tree you're doing, I can perhaps give you some help in finding an alternative method.
Since you don't want to keep reloading the data...file storage and databases are out of question, but several gigs of memory seem like such a hefty price.
Also note that on Windows systems, you can access the memory of another program using ReadProcessMemory(), all you need is a pointer to use for the location of the memory.
You may alternatively implement the data loader as an executable program and the main program as a dll loaded and unloaded on demand. That way you can keep the data in the memory and be able to modify the processing code w/o reloading all the data or doing cross-process memory sharing.
Also, if you can operate on the raw data from the disk w/o making any preprocessing of it (e.g. putting it in a tree, manipulating pointers to its internals), you may want to memory-map the data and avoid loading unused portions of it.
精彩评论