开发者

Interpreting segfault messages

开发者 https://www.devze.com 2022-12-25 18:06 出处:网络
What is the correct interpretation of the following segfault messages? segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]

What is the correct interpretation of the following segfault messages?

segfault at 10 ip 00007f9bebcca90d sp 00007fffb62705f0 error 4 in libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]
segfault at 10 ip 00007fa44d78890d sp 00007fff43f6b720 error 4 in libQtWebKit.so.4.开发者_Python百科5.2[7fa44d2f8000+f6f000]
segfault at 11 ip 00007f2b0022acee sp 00007fff368ea610 error 4 in libQtWebKit.so.4.5.2[7f2aff9f7000+f6f000]
segfault at 11 ip 00007f24b21adcee sp 00007fff7379ded0 error 4 in libQtWebKit.so.4.5.2[7f24b197a000+f6f000]


This is a segfault due to following a null pointer trying to find code to run (that is, during an instruction fetch).

If this were a program, not a shared library

Run addr2line -e yourSegfaultingProgram 00007f9bebcca90d (and repeat for the other instruction pointer values given) to see where the error is happening. Better, get a debug-instrumented build, and reproduce the problem under a debugger such as gdb.

Since it's a shared library

You're hosed, unfortunately; it's not possible to know where the libraries were placed in memory by the dynamic linker after-the-fact. Reproduce the problem under gdb.

What the error means

Here's the breakdown of the fields:

  • address (after the at) - the location in memory the code is trying to access (it's likely that 10 and 11 are offsets from a pointer we expect to be set to a valid value but which is instead pointing to 0)

  • ip - instruction pointer, ie. where the code which is trying to do this lives

  • sp - stack pointer

  • error - An error code for page faults; see below for what this means on x86 (link).

    /*
     * Page fault error code bits:
     *
     *   bit 0 ==    0: no page found       1: protection fault
     *   bit 1 ==    0: read access         1: write access
     *   bit 2 ==    0: kernel-mode access  1: user-mode access
     *   bit 3 ==                           1: use of reserved bit detected
     *   bit 4 ==                           1: fault was an instruction fetch
     *   bit 5 ==                           1: protection keys block access
     *   bit 15 ==                          1: SGX MMU page-fault
     */
    


Error 4 means "The cause was a user-mode read resulting in no page being found.". There's a tool that decodes it here.

Here's the definition from the kernel. Keep in mind that 4 means that bit 2 is set and no other bits are set. If you convert it to binary that becomes clear.

/*
 * Page fault error code bits
 *      bit 0 == 0 means no page found, 1 means protection fault
 *      bit 1 == 0 means read, 1 means write
 *      bit 2 == 0 means kernel, 1 means user-mode
 *      bit 3 == 1 means use of reserved bit detected
 *      bit 4 == 1 means fault was an instruction fetch
 */
#define PF_PROT         (1<<0)
#define PF_WRITE        (1<<1)
#define PF_USER         (1<<2)
#define PF_RSVD         (1<<3)
#define PF_INSTR        (1<<4)

Now then, "ip 00007f9bebcca90d" means the instruction pointer was at 0x00007f9bebcca90d when the segfault happened.

"libQtWebKit.so.4.5.2[7f9beb83a000+f6f000]" tells you:

  • The object the crash was in: "libQtWebKit.so.4.5.2"
  • The base address of that object "7f9beb83a000"
  • How big that object is: "f6f000"

If you take the base address and subtract it from the ip, you get the offset into that object:

0x00007f9bebcca90d - 0x7f9beb83a000 = 0x49090D

Then you can run addr2line on it:

addr2line -e /usr/lib64/qt45/lib/libQtWebKit.so.4.5.2 -fCi 0x49090D
??
??:0

In my case it wasn't successful, either the copy I installed isn't identical to yours, or it's stripped.


Let's go to the source -- 2.6.32, for example. The message is printed by show_signal_msg() function in arch/x86/mm/fault.c if the show_unhandled_signals sysctl is set.

"error" is not an errno nor a signal number, it's a "page fault error code" -- see definition of enum x86_pf_error_code.

"[7fa44d2f8000+f6f000]" is starting address and size of virtual memory area where offending object was mapped at the time of crash. Value of "ip" should fit in this region. With this info in hand, it should be easy to find offending code in gdb.


You can fix it with the following steps :

  • dmesg

Ex : [4970814.649014] upowerd[46459]: segfault at 8 ip 000055ce91269328 sp 00007fff71b98480 error 4 in upowerd[55ce91248000+39000] [4970840.152464] upowerd[46512]: segfault at 8 ip 000055c18f8e5328 sp 00007fffa63df280 error 4 in upowerd[55c18f8c4000+39000]

  • Locate the library, here you have upowerd

  • Re-install it, remove and install upowerd

  • dmesg

Ex : normally, you will have it deleted and mentioned at the last line

[4970942.517131] upowerd[47466]: segfault at 8 ip 00005637fd95b328 sp 00007ffeb77c3460 error 4 in upowerd (deleted)[5637fd93a000+39000]

Best regards,

Moustapha Kourouma

0

精彩评论

暂无评论...
验证码 换一张
取 消