For the following Python code:
fp = open('output.txt', 'wb')
# Very big file, writes a lot of lines, n is a very large nu开发者_JAVA技巧mber
for i in range(1, n):
fp.write('something' * n)
fp.close()
The writing process above can last more than 30 min. Sometimes I get the error MemoryError
. Is the content of the file before closing stored in memory or written in a temp file? If it's in a temporary file, what is its general location on a Linux OS?
Edit:
Added fp.write in a for loop
It's stored in the operating system's disk cache in memory until it is flushed to disk, either implicitly due to timing or space issues, or explicitly via fp.flush()
.
There will be write buffering in the Linux kernel, but at (ir)regular intervals they will be flushed to disk. Running out of such buffer space should never cause an application-level memory error; the buffers should empty before that happens, pausing the application while doing so.
Building on ataylor's comment to the question:
You might want to nest your loop. Something like
for i in range(1,n):
for each in range n:
fp.write('something')
fp.close()
That way, the only thing that gets put into memory is the string "something"
, not "something" * n
.
If you a writing out a large file for which the writes might fail you a better off flushing the file to disk yourself at regular intervals using fp.flush()
. This way the file will be in a location of your choosing that you can easily get to rather than being at the mercy of the OS:
fp = open('output.txt', 'wb')
counter = 0
for line in many_lines:
file.write(line)
counter += 1
if counter > 999:
fp.flush()
fp.close()
This will flush the file to disk every 1000 lines.
If you write line by line, it should not be a problem. You should show the code of what you are doing before the write. For a start you can try to delete objects where not necessary, use fp.flush()
etc..
File writing should never give a memory error; with all probability, you have some bug in another place.
If you have a loop, and a memory error, then I would look if you are "leaking" references to objects.
Something like:
def do_something(a, b = []):
b.append(a)
return b
fp = open('output.txt', 'wb')
for i in range(1, n):
something = do_something(i)
fp.write(something)
fp.close()
I am now picking just an example, but in your actual case the reference leak may be much more difficult to find; however this case will just leak memory inside do_something
because of the way Python handles default parameters of functions.
精彩评论