Implementing cancellable syscalls in userspace_问答_开发者

I'm working on implementing pthread cancellation on Linux without any of the "unpleasant behavior" (some might say bugs) discussed in some of my other recent questions. The Linux/glibc approach to pthread cancellation so far has been to treat it as something that doesn't need kernel support, and that can be handled at the library level purely by enabling asynchronous cancellation before making a syscall, and restoring the previous cancellation state after the syscall returns. This has at least 2 problems, one of them extremely serious:

Cancellation can act after the syscall has returned from kernelspace, but before userspace saves the return value. This results in a resource leak if the syscall allocated a resource, and there is no way to patch over it with cancellation handlers.
If a signal is handled while the thread is blocked开发者_开发知识库 at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This could be extremely dangerous, since the signal handler may call functions which are async-signal-safe but not async-cancel-safe.

My first idea for fixing the problem was to set a flag that the thread is at a cancellation point, rather than enabling async cancellation, and when this flag is set, have the cancellation signal handler check the saved instruction pointer to see if it points to a syscall instruction (arch-specific). If so, this indicates the syscall was not completed and would be restarted when the signal handler returns, so we can cancel. If not, I assumed the syscall had already returned, and deferred cancellation. However, there is also a race condition - it's possible that the thread had not yet reached the syscall instruction at all, in which case, the syscall could block and never respond to the cancellation. Another small problem is that non-cancellable syscalls performed from a signal handler wrongly became cancellable, if the cancellation point flag was set when the signal handler was entered.

I'm looking at a new approach, and looking for feedback on it. The conditions that must be met:

Any cancellation request received prior to completion of the syscall must be acted upon before the syscall blocks for any significant interval of time, but not while it is pending restart due to interruption by a signal handler.
Any cancellation request received after completion of the syscall must be deferred to the next cancellation point.

The idea I have in mind requires specialized assembly for the cancellable syscall wrapper. The basic idea would be:

Push the address of the upcoming syscall instruction onto the stack.
Store the stack pointer in thread-local storage.
Test a cancellation flag from thread-local storage; jump to cancel routine if it is set.
Make the syscall.
Clear the pointer saved in thread-local storage.

The cancel operation would then involve:

Set the cancellation flag in the target thread's thread-local storage.
Test the pointer in the target thread's thread-local storage; if it's not null, send a cancellation signal to the target thread.

The cancellation signal handler would then:

Check that the saved stack pointer (in the signal context) is equal to the saved pointer in the thread-local storage. If not, then the cancellation point was interrupted by a signal handler and there's nothing to do right now.
Check that the program counter register (saved in the signal context) is less than or equal to the address saved at the saved stack pointer. If so, this means the syscall is not yet complete, and we execute cancellation.

The only problem I see so far is in step 1 of the signal handler: if it decides not to act, then after the signal handler returns, the thread could be left blocking on the syscall, ignoring the pending cancellation request. For this, I see two potential solutions:

In this case, install a timer to deliver signals to the specific thread, essentially retrying every millisecond or so until we get lucky.
Raise the cancellation signal again, but return from the cancellation signal handler without unmasking the cancellation signal. It will automatically get unmasked when the interrupted signal handler returns, and then we can try again. This might interfere with behavior of cancellation points within the signal handler, though.

Any thoughts on which approach is best, or if there are other more fundamental flaws I'm missing?

Solution 2 feels like less of a hack. I don't think it would cause the problem you suggest, because cancellable syscalls called within the syscall handler will check the cancellation flag in TLS, which must have already been set if the cancellation signal handler has run and monkeyed with the signal mask anyway.

(It seems like it would be much easier for implementers if every blocking syscall took a sigmask parameter a la pselect()).