I have a lot of objects which form a network by keeping references to other objects. All objects (nodes) have a dict which is their properties.
Now I'm looking for a fast way to store these objects (in a file?) and reload all of them into memory later (I don't need random access). The data is about 300MB in memory which takes 40s to load from my SQL format, but I now want to cache it to have faster access.
Which method would you suggest?
(my pickle attempt faile开发者_如何学JAVAd due to recursion errors despite trying to mess around with getstate :( maybe there is something fast anyway? :))
Pickle would be my first choice. But since you say that it didn't work, you might want to try shelve, even thought it's not shelve's primary purpose.
Really, you should be using Pickle for this. Perhaps you could post some code so that we can take a look and figure out why it doesn't work
"The pickle module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again." So it IS possible. Perhaps increase the recursion limit with sys.setrecursionlimit
.
Hitting Maximum Recursion Depth Using Python's Pickle / cPickle
Perhaps you could set up some layer of indirection where the objects are actually held within, say, another dictionary, and an object referencing another object will store the key of the object being referenced and then access the object through the dictionary. If the object for the stored key is not in the dictionary, it will be loaded into the dictionary from your SQL database, and when it doesn't seem to be needed anymore, the object can be removed from the dictionary/memory (possibly with an update to its state in the database before the version in memory is removed).
This way you don't have to load all the data from your database at once, and can keep a number of the objects cached in memory for quicker access to those. The downside would be the additional overhead required for each access to the main dict.
精彩评论