I used .readline() to parse file line by line, because I need to find out the start position to extract data into a list, and the end point to pause extracting, then repeat until the end of file. My file to read is开发者_StackOverflow formatted like this:
blabla... useless.... ... /sign/ data block(e.g. 10 cols x 1000 rows) ... blank line /sign/ data block(e.g. 10 cols x 1000 rows) ... blank line ... EOF
let's call this file 'myfile' and my python snippet:
f=open('myfile','r')
blocknum=0 #number the data block
data=[]
while True:
# find the extract begnning
while not f.readline().startswith('/sign/'):pass
# creat multidimensional list to store data block
data=append([])
blocknum +=1
line=f.readline()
while line.strip():
# check if the line is a blank line, i.e the end of one block
data[blocknum-1].append(["2.6E" %float(x) for x in line.split()])
line = f.readline()
print "Read Block %d" %blocknum
if not f.readline(): break
The running result was that read a 500M file consume almost 2GB RAM, I cannot figure it out, somebody help! Thanks very much!
You have quite a lot of non-pythonic ambiguous lines in your code. I am not sure but think that you can modify your code the following way first and then check it again against memory usage:
data=[]
with open('myfile','r') as f:
for line in f:
# find the extract beginning - think you can add here more parameters to check
if not line.strip() or line.startswith('/sign/'):
continue
data.append(["%2.6E" % float(x) for x in line.strip().split()])
But I think that this code will also use quite a lot of memory - however if you don't really need to store all the read data from file you can modify code to use generator expression and proceed file data line by line - this would save your memory i guess.
精彩评论