I have read (see here) that "common practice" to print a stack trace using backtrace() during a fault signal handler (e.g. when handling SIGSEGV
) under Linux is to:
1 Get the instruction pointer (EIP
or RIP
) from the undocumented sigcontext
structure.
2 Replace the 2nd frame in the stack trace with the instruction pointer, since the first frame is the signal handler, and the 2nd frame is sup开发者_如何学运维posed to be within libc
in the sigaction
code, which has overwritten the original frame in which the fault occurred.
3 Print the backtrace starting from the newly replaced 2nd frame.
It seems to me in my testing (on x86_64
2.6 kernel) that in fact the original frame in which the fault occurred is present in the stack trace given by backtrace()
in the 3rd frame - the first is the signal handler and the 2nd is in libc
signal handling code.
Is this change in kernel signal handling documented somewhere that you can reference for me?
It seems to me that the upshot is that you can avoid replacing any frames from the instruction pointer, and just print the stack trace from backtrace()
starting with frame 3, but I want confirmation that this is known behavior and the correct way to do it.
This is an interesting thing to try to do, but it's not really portable and probably will never be 100% reliable. So just implement it the way you say, if that works on your platform, and include a couple little unit tests for it so that you know right away if some system you use in the future doesn't work the same way. After all, when this code is invoked, you're already screwed, so just do the best you can and move along.
A totally different alternative which is possible to use either at the same time or instead of your scheme, is to write a script to be invoked by Linux when a program dumps core. This script can then run gdb in batch mode on the core file to get the backtrace and send you an email or whatever.
精彩评论