开发者

read a very very big file with python

开发者 https://www.devze.com 2023-02-16 14:13 出处:网络
What is the best solution to process each line of a text file whose size is abo开发者_如何学Gout 500 MB?

What is the best solution to process each line of a text file whose size is abo开发者_如何学Gout 500 MB?

The proposal to which I had thought :

def files(mon_fichier):
    while True:
        data = mon_fichier.read(1024)
        if not data:
            break
        yield data

fichier = open('tonfichier.txt', 'r')
for bloc in files(fichier):
    print bloc

Thank you in advance


with open('myfile.txt') as inf:
    for line in inf:
        # do something
        pass


Just using the standard file operations should work as long as you keep away from readlines and instead just use readline.


The answer is depending what you want to do with the datas... I recommend to read by block and treat each block just after reading like :

fs = open(source, 'r')
while 1:
    txt = fs.readline(1000)
    < your treatement>
    if txt =="":
    break
fs.close()


As far as I understand the processes, the reading of a file goes through a buffer.

In this condition, mon_fichier.read(1024) don't fetch 1024 bytes directly from the file but from the buffer until this one will be exhausted, and then the buffer will be filled again by a new real reading of, say, 4096 or 8192 or 16384 or... bytes, I don't know precisely (think it's a power of 2, but even not sure)

Then, if you really want to treat blocks of bytes , I think that philnext's code is preferable. But readline(1000) must be replaced with read(1000) if you want to fetch exactly 1000 bytes; readline(1000) returns a line, and no more, even if the line is 4 characters long.

Treating a file by blocks may be what you really want to do , but it seems uncommon to me. It is more frequent to treat a file by lines, and in this case it's the Hugh Bothwell's code that is the right manner.

0

精彩评论

暂无评论...
验证码 换一张
取 消