My application uses lseek()
to seek the desired position to write data.
The file is successfully opened using open()
and my application was able to use lseek()
and write()
lots of times.
At a given time, for some users and not easily reproducable, lseek()
returns -1 with an errno
of 9. File is not closed before this and the filehandle (int) isn't reset.
After this, another file is created; open()
is okay again and lseek()
and write()
works again.
To make it even worse, this user tried the complete sequence again and all was well.
So my question is, can the OS close the file handle for me for some reason? What could cause this? A file indexer or file scanner of some sort?
What is the best way to solve this; is this pseudo code the best solution? (never mind the code layout, will create functions for it)
int fd=open(...);
if (fd>-1) {
long result = lseek(fd,....);
if (result == -1 && errno==9) {
close(fd..); //make sure we try to close nicely
fd=open(...);
result = lseek(fd,....);
}
}
Anybody experience with something similar?
Summary: file seek and write works okay for a given fd and suddenly g开发者_运维问答ives back errno=9 without a reason.
So my question is, can the OS close the file handle for me for some reason? What could cause > this? A file indexer or file scanner of some sort?
No, this will not happen.
What is the best way to solve this; is this pseudo code the best solution? (never mind the code layout, will create functions for it)
No, the best way is to find the bug and fix it.
Anybody experience with something similar?
I've seen fds getting messed up many times, resulting in EBADF in the some of the cases, and blowing up spectacularly in others, it's been:
- buffer overflows - overflowing something and writing a nonsense value into a 'int fd;' variable.
- silly bugs that happen because some corner case someone did
if(fd = foo[i].fd)
when they meantif(fd == foo[i].fd)
- Raceconditions between threads, some thread closes the wrong file descriptor that some other thread wants to use.
If you can find a way to reproduce this problem, run your program under 'strace', so you can see whats going on.
The OS shall not close file handles randomly (I am assuming a Unix-like system). If your file handle is closed, then there is something wrong with your code, most probably elsewhere (thanks to the C language and the Unix API, this can be really anywhere in the code, and may be due to, e.g., a slight buffer overflow in some piece of code which really looks like to be unrelated).
Your pseudo-code is the worst solution, since it will give you the impression of having fixed the problem, while the bug still lurks.
I suggest that you add debug prints (i.e. printf()
calls) wherever you open and close a file or socket. Also, try Valgrind.
(I just had yesterday a spooky off-by-1 buffer overflow, which damaged the least significant byte of a temporary slot generated by the compiler to save a CPU register; the indirect effect was that a structure in another function appeared to be shifted by a few bytes. It took me quite some time to understand what was going on, including some thorough reading of Mips assembly code).
I don't know what type of setup you have, but the following scenario, could I think produce such an effect (or else one similar to it). I have not tested this to verify, so please take it with a grain of salt.
If the file/device you are opening implemented as a server application (eg NFS), consider what could happen if the server application goes down / restarts / reboots. The file descriptor though originally valid at the client end might no longer map to a valid file handle at the server end. This can conceivably lead to a sequence of events wherein the client will get EBADF.
Hope this helps.
No, the OS should not close file handles just like that, and other applications (file scanners etc.) should not be able to do it.
Do not work around the problem, find it's source. If you don't know what the reason for your problem was, you will never know if your workaround actually does work.
- Check your assumptions. Is
errno
set to 0 before the call? Is fd really valid at the point the call is being made? (I know you said it is, but did you check it?) - What is the output of
puts( strerror( 9 ) );
on your platform?
精彩评论