snprintf in signal handler creates segmentation fault if started with valgrind_问答_开发者

This very simple C program gives me a segmentation fault when running it with valgrind. Its runs fine when started normal. It crashes when you send the USR1 signal to the process.

The problem seems to be the way snprintf handles the formatting of the float value, because it works fine if you use a string (%s) or int (%d) format parameter.

P.S. I know you you shouldn't call any printf family function inside a signal handler, but still why does it only crash with valgrind.

    #include <stdio.h>
    #include <signal开发者_如何学运维.h>

    void sig_usr1(int sig) {
            char buf[128];
            snprintf(buf, sizeof(buf), "%f", 1.0);
    }

    int main(int argc, char **argv) {
            (void) signal(SIGUSR1, sig_usr1);
            while(1);
    }

As cnicutar notes, valgrind may have an effect on anything timing related and signal handlers would certainly qualify.

I don't think snprintf is safe to use in a signal handler so it might be working in the non-valgrind case solely by accident and then valgrind comes in, changes the timing, and you get the flaming death that you were risking without valigrind.

I found a list of functions that are safe in signal handlers (according to POSIX.1-2003 ) here:

http://linux.die.net/man/2/signal

Yes, the linux.die.net man pages are a bit out of date but the list here (thanks to RedX for finding this one):

https://www.securecoding.cert.org/confluence/display/seccode/SIG30-C.+Call+only+asynchronous-safe+functions+within+signal+handlers

doesn't mention snprintf either except in the context of OpenBSD where it say:

... asynchronous-safe in OpenBSD but "probably not on other systems," including snprintf(), ...

so the implication is that snprintf is not, in general, safe in a signal handler.

And, thanks to Nemo, we have an authoritative list of functions that are safe for use in signal handlers:

http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_04_03

Start at that link and search down for _Exit and you'll see the list; then you'll see that snprintf is not on the list.

Also, I remember using write() in a signal handler because fprintf wasn't safe for a signal handler but that was a long time ago.

I don't have a copy of the relevant standard so I can't back this up with anything really authoritative but I thought I'd mention it anyway.

From manual: http://www.network-theory.co.uk/docs/valgrind/valgrind_27.html and http://www.network-theory.co.uk/docs/valgrind/valgrind_24.html

Valgrind's signal simulation is not as robust as it could be. Basic POSIX-compliant sigaction and sigprocmask functionality is supplied, but it's conceivable that things could go badly awry if you do weird things with signals. Workaround: don't. Programs that do non-POSIX signal tricks are in any case inherently unportable, so should be avoided if possible.

So, snprintf in signal handler is not a POSIX-allowed signal trick and valgrind has a right to brick your programs.

Why snprintf is not signal-safe?

The glibc manual says: http://www.gnu.org/software/hello/manual/libc/Nonreentrancy.html

If a function uses and modifies an object that you supply, then it is potentially non-reentrant; two calls can interfere if they use the same object.

This case arises when you do I/O using streams. Suppose that the signal handler prints a message with fprintf. Suppose that the program was in the middle of an fprintf call using the same stream when the signal was delivered. Both the signal handler's message and the program's data could be corrupted, because both calls operate on the same data structure—the stream itself.

However, if you know that the stream that the handler uses cannot possibly be used by the program at a time when signals can arrive, then you are safe. It is no problem if the program uses some other stream.

You can say that s*printf* are not on streams, but on strings. But internally, glibc's snprintf does work on special stream:

ftp://sources.redhat.com/pub/glibc/snapshots/glibc-latest.tar.bz2/glibc-20090518/libio/vsnprintf.c

int
_IO_vsnprintf (string, maxlen, format, args)
{
  _IO_strnfile sf; // <<-- FILE*-like descriptor

The %f output code in glibc also has a malloc call inside it:

ftp://sources.redhat.com/pub/glibc/snapshots/glibc-latest.tar.bz2/glibc-20090518/stdio-common/printf_fp.c

/* Allocate buffer for output.  We need two more because while rounding
   it is possible that we need two more characters in front of all the
   other output.  If the amount of memory we have to allocate is too
   large use `malloc' instead of `alloca'.  */
size_t wbuffer_to_alloc = (2 + (size_t) chars_needed) * sizeof (wchar_t);
buffer_malloced = ! __libc_use_alloca (chars_needed * 2 * sizeof (wchar_t));
if (__builtin_expect (buffer_malloced, 0))
  {
    wbuffer = (wchar_t *) malloc (wbuffer_to_alloc);
    if (wbuffer == NULL)
      /* Signal an error to the caller.  */
      return -1;
  }
else
  wbuffer = (wchar_t *) alloca (wbuffer_to_alloc);

Valgrind slightly changes the timings in your program.

Have a loot at the FAQ.

My program crashes normally, but doesn't under Valgrind, or vice versa. What's happening?

When a program runs under Valgrind, its environment is slightly different to when it runs natively. Most of the time this doesn't make any difference, but it can, particularly if your program is buggy.

This is a valgrind bug. It calls your signal handler with a stack that is not 16-byte aligned as required by the ABI. On x86_64, floating point arguments are passed in XMM registers which can only be stored at addresses that are 16-byte aligned. You can work around the problem by compiling for 32-bit (gcc -m32).