I have an application running on Solaris 8 (SunOS 5.8 Generic_108528-27 sun4u sparc SUNW,Sun-Fire-880) and it's running good for several days until recently it crashed. There was a watchdog module which restarted the application when it crashed. However, it run and crashed again and again. After examined the core dumps, I found that it crashed on the system function calls such as poll, write and send. I examined the contents of the variables passed to the functions and they looked good. I have no idea how to troubleshoot this. Anyone can help to give some guidance on where proceed? Thanks in advance.
Below shows one of the core dump examples on poll:
bash$ gdb applx applx.core
GDB is free software and you are welcome to distribute copies of it under certain c开发者_开发技巧onditions; type "show copying" to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details. GDB 4.16 (sparc-sun-solaris2.5), Copyright 1996 Free Software Foundation, Inc...warning: exec file is newer than core file.
Core was generated by `applx -h'. Program terminated with signal 11, Segmentation fault. Reading symbols from /usr/lib/libsocket.so.1...done. Reading symbols from /usr/lib/libnsl.so.1...done. Reading symbols from /usr/lib/libgen.so.1...done. Reading symbols from /usr/lib/libc.so.1...done. Reading symbols from /usr/lib/libdl.so.1...done. Reading symbols from /usr/lib/libmp.so.2...done. Reading symbols from /usr/platform/SUNW,Sun-Fire-880/lib/libc_psr.so.1...done. #0 0xff219ec4 in _libc_poll () (gdb) bt #0 0xff219ec4 in _libc_poll () #1 0xff1cccac in _select () #2 0x1cf08 in loop () at /home/ian123/applx/src/task.c:1450 #3 0x1e0d4 in state_start (local=0) at /home/ian123/applx/src/state.c:1047 #4 0x1a0f4 in main (argc=537600, argv=0x83400) at /home/ian123/applx/src/main.c:578 (gdb) up #1 0xff1cccac in _select () (gdb) up #2 0x1cf08 in loop () at /home/ian123/applx/src/task.c:1450 1450 r = select(maxfd, rfdsp, wfdsp, efdsp, tvp); (gdb) p maxfd $1 = 23 (gdb) p rfdsp $2 = (fd_set *) 0xb8020 (gdb) p wfdsp $3 = (fd_set *) 0x0 (gdb) p efdsp $4 = (fd_set *) 0x0 (gdb) p tvp $5 = (struct timeval *) 0xb81a0 (gdb) p *rfdsp $6 = {fds_bits = {7610424, 0 }} (gdb) p *tvp $7 = {tv_sec = 0, tv_usec = 380002}When I'm investigating a segfault and I have no idea where it's happening, I use the following gdb command:
x/1i <program_counter>
(Substitute <program counter> for your architecture's ...(drum roll)... program counter, e.g: $eip on x86. I guess it's $pc or similar on SPARC).
That shows the faulting instruction. From there I examine registers that contain memory addresses.
If GDB will show you the source code where the segmentation fault occurred then this MAY quickly lead to an understanding of the problem.
精彩评论