Which functions are re-entrant in Python for signal library processing_问答_开发者

Discussing Signal handlers and logging in Python the question which functions are re-entrant in Python came up in my mind.

The signal library mention:

Although Python signal handlers are called asynchronously as far as the Python user is concerned, they can only occur between the atomic instructions of the Python interpreter. This means that signals arriving during long calculations implemented purely in C (such as regular expression matches on large bodies of text) may be delayed for an arbitrary amount of time.

That re-entrance is not typical is pointed out by the logging library:

If you are implementing asynchronous signal handlers using the signal module, you may not be able to use logging from within such handlers. This is because lock implementations in the threading module are not always re-entrant, and so cannot be invoked from such signal handlers.

I'm a little bit confused because the signal library talks about the GIL (global interpreter lock) as ".. between the atomic instructions ..". In this case signals are postponed and executed as soon as the GIL is left/unlocked. A kind of signal queue.

That makes sense but it does not matter if the functions which are called by the postponed signal handler are re-entrant because they are 开发者_如何学编程not called within the real POSIX signal handler with the "re-entrant"-limitation:

Only a defined list of POSIX C functions are declared as re-entrant and can be called within a POSIX signal handler. IEEE Std 1003.1 lists 118 re-entrant UNIX functions you find at https://www.opengroup.org/ (login required).

I believe that what makes the logging module non-reentrant is that it uses a threading.Lock (instead of a RLock) to synchronize several threads logging to the same handlers (so messages don't get interweaved).

This means that if a logging call which has acquired a lock is interrupted by a signal handler and that signal handlers tries to log it will deadlock forever waiting for the previous acquire to be released.

These locks have nothing to do with the GIL by the way, they are "user created" locks to put it some way, the GIL is a lock used by the interpreter (an implementation detail).

Some people might prefer to listen for signals using pselect() / ppoll() / a Linux signalfd. However, pselect() / ppoll() are not available in the python select module.

Some event loops claim to support signals. If you are considering using an event loop, you could look at its documentation. For example: https://docs.python.org/3/library/asyncio-eventloop.html#unix-signals

Some event loops, like the built-in asyncio module, are currently implemented using signal.set_wakeup_fd(). This is buggy. See the heading below.

Otherwise, to answer the letter of your question: os.write(). You can then use the self-pipe trick.

import os
import fcntl
import errno

(sigint_write_pipe, sigint_read_pipe) = os.pipe()
fcntl.fcntl(sigint_write_pipe, fcntl.SET_FL,
            os.O_NONBLOCK | os.O_CLOEXEC)

def handle_sigint():
    try:
        os.write(sigint_write_pipe, b'\0')
    except IOError as e:
        if e.errno = errno.EWOULDBLOCK:
            pass  # pipe is already full. no problem.
        else:
            raise

signal.signal(signal.SIGINT, handle_sigint)

# Now listen to sigint_read_pipe, using your preferred
# select() / poll() / event loop etc
...

There are several ways a function could achieve async-signal safety. os.write() is the most likely function, to meet the first criteria:

Functions implemented purely in C. Because the python-level signal handler does not interrupt C functions.
Python functions that do not access mutable global variables.
Python functions that access mutable global variables, where their "invariants" are never temporarily broken. E.g. a single variable which no invariant applies to.

In many cases, async-signal safety will be considered a private implementation detail, not a public guarantee of future behaviour. This is true even in C. The official python documentation does not mention your concern. So we should not trust python documentation as a guide here.

signal.set_wakeup_fd()

If you still believe the python documentation, there is a second option that is "commonly used". Pass a pipe to signal.set_wakeup_fd(), and poll the other end of the pipe. This lets you detect when your program has been interrupted by a signal. It does not let you detect what the signal was, because there could have been more than one, and they could overflow the pipe buffer and be lost.