I have been assigned the task of reading a .txt file which is a log of various 开发者_StackOverflowevents and writing some of those events into a dictionary.
The problem is that the file can sometimes get bigger than 3GB in size. This means that the dictionary gets too big to fit into main memory. It seems that Shelve is a good way to solve this problem. However, since I will be constantly modifying the dictionary, I must have the writeback
option enabled. This is where I am concerned - the tutorial says that this would slow down the read/write process and use more memory, but I am unable to find statistics on how the speed and memory are affected.
Can anyone clarify by how much the read/write speed and memory are affected so that I can decide whether to use the writeback option or sacrifice some readability for code efficiency?
Thank you
For databases this size, shelve really is the wrong tool. If you do not need a highly available client/server architecture, and you just want to convert your TXT file to a local in-memory-accessible database, you really should be using ZODB
If you need something highly-available, you will of course need to switch to a formal "NoSQL" database, of which there are many to choose from.
Here's a simple example of how to convert your shelve database to a ZODB database which will solve your memory usage / performance problems.
#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re
reload(sys)
sys.setdefaultencoding("utf-8")
parser = OptionParser()
parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")
parser.set_defaults()
options, args = parser.parse_args()
if options.in_file == False or options.out_file == False :
print "Need input and output database filenames"
exit(1)
db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()
for key, value in db.iteritems() :
print "Copying key: " + str(key)
newdb[key] = value
transaction.commit()
精彩评论