开发者

End of nonblocking file

开发者 https://www.devze.com 2023-02-25 10:23 出处:网络
How is end of file detected f开发者_运维技巧or a file in nonblocking mode?At least on POSIX (including Linux), the obvious answer is that nonblocking regular files don\'t exist. Regular files ALWAYS b

How is end of file detected f开发者_运维技巧or a file in nonblocking mode?


At least on POSIX (including Linux), the obvious answer is that nonblocking regular files don't exist. Regular files ALWAYS block, and O_NONBLOCK is silently ignored.

Similarly, poll()/select() et al. will always tell you that a fd pointing to a regular file is ready for I/O, regardless of whether the data is ready in the page cache or still on disk (mostly relevant for reading).

EDIT And, since O_NONBLOCK is a no-op for regular files, a read() on a regular file will never set errno to EAGAIN, contrary to what another answer to this question claims.

EDIT2 References:

From the POSIX (p)select() specification: "File descriptors associated with regular files shall always select true for ready to read, ready to write, and error conditions."

From the POSIX poll() specification: "Regular files shall always poll TRUE for reading and writing."

The above suffices to imply that while perhaps not strictly prohibited, non-blocking regular files doesn't make sense as there would be no way to poll them except busy-waiting.

Beyond the above, there is at least some circumstantial evidence

From the POSIX open() specification: The behavior for file descriptors referring to pipes, block special files, and character special files is defined. "Otherwise, the behavior of O_NONBLOCK is unspecified."

Some related links:

http://tinyclouds.org/iocp-links.html

http://www.remlab.net/op/nonblock.shtml

http://davmac.org/davpage/linux/async-io.html

And, even here on stackoverflow:

Can regular file reading benefited from nonblocking-IO?

As the answer by R. points out, due to how page caching works, non-blocking for regular files is not very easily defined. E.g. what if by some mechanism you find out that data is ready for reading in the page cache, and then before you read it the kernel decides to kick that page out of cache due to memory pressure? It's different for things like sockets and pipes, because correctness requires that data is not discarded just like that.

Also, how would you select/poll for a seekable file descriptor? You'd need some new API that supported specifying which byte range in the file you're interested in. And the kernel implementation of that API would tie in to the VM system, as it would need to prevent the pages you're interested in from being kicked out. Which would imply that those pages would count against the process locked pages limit (see ulimit -l) in order to prevent a DOS. And, when would those pages be unlocked? And so on.


This is a really good question. Non-blocking sockets return an empty string from recv() rather than throwing a socket.error indicating that there's no data available. For files though, there doesn't seem to be any direct indicator that's available to Python.

The only mechanism I can think of for detecting EOF is to compare the current position of the file to the overall file size after receiving an empty string:

def read_nonblock( fd ):
    t = os.read(fd, 4096)
    if t == '':
        if os.fstat(fd).st_size == os.lseek(fd, 0, os.SEEK_CUR):
            raise Exception("EOF reached")
    return t

This, of course, assumes that regular files in non-blocking mode will actually return immediately rather than wait for data to be read from the disk. I'm not sure if that's true on Windows or Linux. It'd be worth testing but I wouldn't be surprised if reading of regular files even in non-blocking mode only returns an empty string when the actual EOF is encountered.


A nice trick that works well in c++ (YMMV) is that if the amount of data returned is less that the size of the buffer (i.e. the buffer is not full) you can safely assume that the transaction has completed. there then is a 1/buffersize probability that the last part of the file completely fills the buffer so for a high buffer size you can be reasonable sure that the transaction will end with a non-filled buffer and so if you test the quantity of data returned against the buffer size and they are not equal you know that either an error occured or the transaction is complete. Not sure if this will translate to python but that is my method for spotting EOFs


Doesn't select tell you there is something to read even if its just the EOF? If it tells you there is something to read and you don't get anything back then it must be EOF. I believe this to be the case for sockets.


For files, setting the file descriptor as non-blocking does nothing - all IO is done blocking anyway.

If you really need non-blocking file IO, you need to look in to aio_read and friends, which are the asynchronous IO facility for file access. These are pretty non-portable and work somewhat flakily at times - so most projects have actually decided to use a separate process (or thread) for IO and just use blocking IO there.

Then again, maybe you are interested in somehow "select":ing a file such that you would get notified when the file grows. As you've probably realized select, poll, etc. do not work. Most software does this simply by polling the file every second or so - for example "tail -f" does it's magic by polling. However, you can also get the kernel to notify you when the file is written to - and this happens by inotify and friends. There are some handy libraries wrapping all this up for you so you don't have to muck around with the specifics yourself. Namely, for python, inotifyx and pyinotify.

0

精彩评论

暂无评论...
验证码 换一张
取 消