开发者

Does the Python "open" function save its content in memory or in a temp file?

开发者 https://www.devze.com 2022-12-19 16:06 出处:网络
For the following Python code: fp = open(\'output.txt\', \'wb\') # Very big file, writes a lot of lines, n is a very large nu开发者_JAVA技巧mber

For the following Python code:

fp = open('output.txt', 'wb')
# Very big file, writes a lot of lines, n is a very large nu开发者_JAVA技巧mber
for i in range(1, n):
    fp.write('something' * n)
fp.close()

The writing process above can last more than 30 min. Sometimes I get the error MemoryError. Is the content of the file before closing stored in memory or written in a temp file? If it's in a temporary file, what is its general location on a Linux OS?

Edit:

Added fp.write in a for loop


It's stored in the operating system's disk cache in memory until it is flushed to disk, either implicitly due to timing or space issues, or explicitly via fp.flush().


There will be write buffering in the Linux kernel, but at (ir)regular intervals they will be flushed to disk. Running out of such buffer space should never cause an application-level memory error; the buffers should empty before that happens, pausing the application while doing so.


Building on ataylor's comment to the question:

You might want to nest your loop. Something like

for i in range(1,n):
    for each in range n:
        fp.write('something')
fp.close()

That way, the only thing that gets put into memory is the string "something", not "something" * n.


If you a writing out a large file for which the writes might fail you a better off flushing the file to disk yourself at regular intervals using fp.flush(). This way the file will be in a location of your choosing that you can easily get to rather than being at the mercy of the OS:

fp = open('output.txt', 'wb')
counter = 0
for line in many_lines:
    file.write(line)
    counter += 1
    if counter > 999:
        fp.flush()
fp.close()

This will flush the file to disk every 1000 lines.


If you write line by line, it should not be a problem. You should show the code of what you are doing before the write. For a start you can try to delete objects where not necessary, use fp.flush() etc..


File writing should never give a memory error; with all probability, you have some bug in another place.

If you have a loop, and a memory error, then I would look if you are "leaking" references to objects.
Something like:

def do_something(a, b = []):
    b.append(a)
    return b

fp = open('output.txt', 'wb') 

for i in range(1, n): 
    something = do_something(i)
    fp.write(something)

fp.close()

I am now picking just an example, but in your actual case the reference leak may be much more difficult to find; however this case will just leak memory inside do_something because of the way Python handles default parameters of functions.

0

精彩评论

暂无评论...
验证码 换一张
取 消