linux threads and fopen() fclose() fgets()_问答_开发者

I'm looking at some legacy Linux code which uses pthreads.

In one thread a file is read via fgets(). The FILE variable is a global开发者_如何学Go variable shared across all threads. (Hey, I didn't write this...)

In another thread every now and again the FILE is closed and reopened with another filename.

For several seconds after this has happened, the thread fgets() acts as if it is continuing to read the last record it read from the previous file: almost as if there was an error but fgets() was not returning NULL. Then it sorts itself out and starts reading from the new file.

The code looks a bit like this (snipped for brevity so I hope it's still intelligible):

In one thread:

while(gRunState != S_EXIT){
  nanosleep(&timer_delay,0);
  flag = fgets(buff, sizeof(buff), gFile);
  if (flag != NULL){
    // do something with buff...
  }
}

In the other thread:

fclose(gFile);
gFile = fopen(newFileName,"r");

There's no lock to make sure that the fgets() is not called at the same time as the fclose()/fopen().

Any thoughts as to failure modes which might cause fgets() to fail but not return NULL?

How the described code goes wrong

The stdio library buffers data, allocating memory to store the buffered data. The GNU C library dynamically allocates file structures (some libraries, notably on Solaris, use pointers to statically allocated file structures, but the buffer is still dynamically allocated unless you set the buffering otherwise).

If your thread works with a copy of a pointer to the global file pointer (because you passed the file pointer to the function as an argument), then it is conceivable that the code would continue to access the data structure that was orginally allocated (even though it was freed by the close), and would read data from the buffer that was already present. It would only be when you exit the function, or read beyond the contents of the buffer, that things start going wrong - or the space that was previously allocated to the file structure is reallocated for a new use.

FILE *global_fp;

void somefunc(FILE *fp, ...)
{
    ...
    while (fgets(buffer, sizeof(buffer), fp) != 0)
        ...
}

void another_function(...)
{
    ...
    /* Pass global file pointer by value */
    somefunc(global_fp, ...);
    ...
}

Proof of Concept Code

Tested on MacOS X 10.5.8 (Leopard) with GCC 4.0.1:

#include <stdio.h>
#include <stdlib.h>

FILE *global_fp;
const char etc_passwd[] = "/etc/passwd";

static void error(const char *fmt, const char *str)
{
    fprintf(stderr, fmt, str);
    exit(1);
}

static void abuse(FILE *fp, const char *filename)
{
    char buffer1[1024];
    char buffer2[1024];
    if (fgets(buffer1, sizeof(buffer1), fp) == 0)
        error("Failed to read buffer1 from %s\n", filename);
    printf("buffer1: %s", buffer1);

    /* Dangerous!!! */
    fclose(global_fp);
    if ((global_fp = fopen(etc_passwd, "r")) == 0)
        error("Failed to open file %s\n", etc_passwd);

    if (fgets(buffer2, sizeof(buffer2), fp) == 0)
        error("Failed to read buffer2 from %s\n", filename);
    printf("buffer2: %s", buffer2);
}

int main(int argc, char **argv)
{
    if (argc != 2)
        error("Usage: %s file\n", argv[0]);

    if ((global_fp = fopen(argv[1], "r")) == 0)
        error("Failed to open file %s\n", argv[1]);

    abuse(global_fp, argv[1]);

    return(0);
}

When run on its own source code, the output was:

Osiris JL: ./xx xx.c
buffer1: #include <stdio.h>
buffer2: ##
Osiris JL:

So, empirical proof that on some systems, the scenario I outlined can occur.

How to fix the code

The fix to the code is discussed well in other answers. If you avoid the problem I illustrated (for example, by avoiding global file pointers), that is simplest. Assuming that is not possible, it may be sufficient to compile with the appropriate flags (on many Unix-like systems, the compiler flag '-D_REENTRANT' does the job), and you will end up using thread-safe versions of the basic standard I/O functions. Failing that, you may need to put explicit thread-safe management policies around the access to the file pointers; a mutex or something similar (and modify the code to ensure that the threads use the mutex before using the corresponding file pointer).

A FILE * is just a pointer to the various resources. If the fclose does not zero out those resource, it's possible that the values may make enough sense that fgets does not immediately notice it.

That said, until you add some locking, I would consider this code completely broken.

Umm, you really need to control access to the FILE stream with a mutex, at the minimum. You aren't looking at some clever implementation of lock free methods, you are looking at really bad (and dusty) code.

Using thread local FILE streams is the obvious and most elegant fix, just use locks appropriately to ensure no two threads operate on the same offset of the same file at once. Or, more simply, ensure that threads block (or do other work) while waiting for the file lock to clear. POSIX advisory locks would be best for this, or your dealing with dynamically growing a tree of mutexes... or initializing a file lock mutex per thread and making each thread check the other's lock (yuck!) (since files can be re-named).

I think you are staring down the barrel of some major fixes .. unfortunately (from what you have indicated) there is no choice but to make them. In this case, its actually easier to debug a threaded program written in this manner than it would be to debug something using forks, consider yourself lucky :)

You can also put some condition-wait (pthread_cond_wait) instead of just some nanosleep which will get signaled when intended e.g. when a new file gets fopened.