开发者

Will Python open a file before it's finished writing?

开发者 https://www.devze.com 2023-03-10 10:13 出处:网络
I am writing a script that will be polling a directory looking for new files. In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior

I am writing a script that will be polling a directory looking for new files.

In this scenario, is it necessary to do some sort of error checking to make sure the files are completely written prior to accessing them?

I don't want to work with a file before it has been written completely to disk, but because the info I want from the file is near the beginning, it seems like it could be possible to pull the data I need开发者_如何学Go without realizing the file isn't done being written.

Is that something I should worry about, or will the file be locked because the OS is writing to the hard drive?

This is on a Linux system.


Typically on Linux, unless you're using locking of some kind, two processes can quite happily have the same file open at once, even for writing. There are three ways of avoiding problems with this:

  1. Locking

    By having the writer apply a lock to the file, it is possible to prevent the reader from reading the file partially. However, most locks are advisory so it is still entirely possible to see partial results anyway. (Mandatory locks exist, but a strongly not recommended on the grounds that they're far too fragile.) It's relatively difficult to write correct locking code, and it is normal to delegate such tasks to a specialist library (i.e., to a database engine!) In particular, you don't want to use locking on networked filesystems; it's a source of colossal trouble when it works and can often go thoroughly wrong.

  2. Convention

    A file can instead be created in the same directory with another name that you don't automatically look for on the reading side (e.g., .foobar.txt.tmp) and then renamed atomically to the right name (e.g., foobar.txt) once the writing is done. This can work quite well, so long as you take care to deal with the possibility of previous runs failing to correctly write the file. If there should only ever be one writer at a time, this is fairly simple to implement.

  3. Not Worrying About It

    The most common type of file that is frequently written is a log file. These can be easily written in such a way that information is strictly only ever appended to the file, so any reader can safely look at the beginning of the file without having to worry about anything changing under its feet. This works very well in practice.

There's nothing special about Python in any of this. All programs running on Linux have the same issues.


On Unix, unless the writing application goes out of its way, the file won't be locked and you'll be able to read from it.

The reader will, of course, have to be prepared to deal with an incomplete file (bearing in mind that there may be I/O buffering happening on the writer's side).

If that's a non-starter, you'll have to think of some scheme to synchronize the writer and the reader, for example:

  • explicitly lock the file;
  • write the data to a temporary location and only move it into its final place when the file is complete (the move operation can be done atomically, provided both the source and the destination reside on the same file system).


If you have some control over the writing program, have it write the file somewhere else (like the /tmp directory) and then when it's done move it to the directory being watched.

If you don't have control of the program doing the writing (and by 'control' I mean 'edit the source code'), you probably won't be able to make it do file locking either, so that's probably out. In which case you'll likely need to know something about the file format to know when the writer is done. For instance, if the writer always writes "DONE" as the last four characters in the file, you could open the file, seek to the end, and read the last four characters.


Yes it will.

I prefer the "file naming convention" and renaming solution described by Donal.

0

精彩评论

暂无评论...
验证码 换一张
取 消