开发者

What is a good way to create a string for crash reporting Win32 C++ that reflects the cause of the crash?

开发者 https://www.devze.com 2023-02-03 10:01 出处:网络
We\'re using Fogbugz for tracking issues and I am in the middle of writing a C++ wrapper around the XML API for Fogbugz.

We're using Fogbugz for tracking issues and I am in the middle of writing a C++ wrapper around the XML API for Fogbugz.

The best practice seem开发者_C百科s to be to use the "scout" field so that similar/same crashes are just counted but not reported again. To do that we need a unique string for a particular cause of a crash.

In Win32 - after getting a dmp file or other crash handler what is a good way to make a unique string for a crash? (we're going to create a dmp file and send it to the fogbugz server)

In previous postings/articles/etc Joel has made various suggestions but much of those counted on a language like C# that use reflection and have a lot of information that is either harder to get or not possible to get.

Have any other people gotten things like stack traces or other things to make scout entries in fogbugz?

EDIT To clarify - we don;t want a unique id for every incident - there are likely crashes that have the same code path. We want to capture that. I was thinking that we would get the last few stack calls that are in our code (not ones from win32 DLLs) - but not sure how to go about doing this.

Reporting every crash as unique is not right. Reporting all crashes under the same case is not right. Different users repeating a scenario that causes a crash should map to the same incident.

EDIT

What I think we want is a general "signature" of a crash - based on what is on the stack. Similar stacks should have the same signature. For example - take the top 5 methods that are in our app and then the first call (if any) we make into an MS DLL. This would probably be sufficient for a signature and would likely correlate the crashes that are "the same".

So how does one get the list of methods on the stack? And how can you tell if they are from your own app or in another DLL?

EDIT - NOTE We want to create a "bucket id"/signature while in the exception handler so that we can create the minidump and send it to fogbugz as a scout description. Alternatively we can load up the dump on t he next start of the app and send it then with a signature we generate.


Here in my project I use the Address Memory of the Crash as a "Unique" ID.


IMO the best thing you can use will be bucket id from dump analysis. Use properly configured Debugging Tools for Windows (windbg), one can do !analyze -v and classify your dumps into different buckets based on bucket id. Bucket id guaranteed that if two dumps are the same, their bucket id will be the same. That solves part of the puzzle.

Many times two dumps rooted from same problem will create different bucket id's (maybe version difference, say your 1.0 and 1.1 both crash at same point). You can use faulting module and stack signature to correlate bugs from the same point of fault.

There will be certain things that causes very random dumps (e.g. heap corruption, the faulting module is typically the victim). Therefore dump analysis should be considered best-effort. When you can't, you can't.


I used something like this to generate exceptions in my last app (MSVC), so every error would get logged with the sourcefile and line it occured on:

class Error {
    //...
    public: Error(string file, string line, string error) ;
};

#define ERROR(err) Error(__FILE__, __LINE__, err)


It's probably a little bit late, but I will add my solution here, too, in case it can help other people. You can do this using fools from "Debugging Tools for Windows", for example windbg.exe or better kd.exe. Running the command "kd.exe -z "path_to_dump.dmp" -c "kd;q" >> dumpstack.txt, you might get the following result:

Microsoft (R) Windows Debugger Version 10.0.15063.400 X86 Copyright (c) Microsoft Corporation. All rights reserved.

Loading Dump File [d:\work\bugs\14122\myexe.exe.2624.dmp] User Mini Dump File with Full Memory: Only application data is available

************* Symbol Path validation summary **************
Response                         Time (ms)     Location
Deferred                                       srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Symbol search path is: srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows 10 Version 15063 MP (4 procs) Free x86 compatible
Product: WinNt, suite: SingleUserTS
15063.0.x86fre.rs2_release.170317-1834
Machine Name:
Debug session time: Fri Oct 13 00:09:01.000 2017 (UTC + 1:00)
System Uptime: 0 days 0:18:33.797
Process Uptime: 0 days 0:03:40.000
................................................................
.....................................................
Loading unloaded module list
..............................
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a40.2580): Security check failure or stack buffer overrun - code c0000409 (first/second chance not available)
eax=00000001 ebx=00000000 ecx=00000007 edx=77cc4350 esi=00000000 edi=00000000
eip=62ae7666 esp=0b75e17c ebp=0b75e1a8 iopl=0         nv up ei pl nz na po nc
cs=001b  ss=0023  ds=0023  es=0023  fs=003b  gs=0000             efl=00000202
msvcr120!abort+0x28:
62ae7666 cd29            int     29h
0:068> kd: Reading initial command 'kb;q'
ChildEBP RetAddr  Args to Child              
0b75e178 62addc5f 935dda1f 00000000 00000000 msvcr120!abort+0x28
0b75e1a8 0b75e7d4 62a9b436 0b75e1dc 62a52aa5 msvcr120!terminate+0x33
WARNING: Frame IP not in any known module. Following frames may be wrong.
0b75e1ac 62a9b436 0b75e1dc 62a52aa5 00000000 0xb75e7d4
0b75e1b4 62a52aa5 00000000 62a59740 0b75e7d4 msvcr120!__FrameUnwindToState+0x89
0b75e1c8 62a52b33 00000000 00000000 00000000 msvcr120!_EH4_CallFilterFunc+0x12
0b75e1f4 62a5a0f3 62b1f7b8 62a4f7c6 0b75e324 msvcr120!_except_handler4_common+0x8e
0b75e214 77cd6152 0b75e324 0b75e7c4 0b75e344 msvcr120!_except_handler4+0x1e
0b75e238 77cd6124 0b75e324 0b75e7c4 0b75e344 ntdll!ExecuteHandler2+0x26
0b75e30c 77cc4266 0b75e324 0b75e344 0b75e324 ntdll!ExecuteHandler+0x24
0b75e30c 74cf28f2 0b75e324 0b75e344 0b75e324 ntdll!KiUserExceptionDispatcher+0x26
0b75e684 62a59339 e06d7363 00000001 00000003 KERNELBASE!RaiseException+0x62
0b75e6c4 6001821c 0b75e6e4 6004e1bc 946a8f2a msvcr120!_CxxThrowException+0x5b
0b75e6f8 60018042 0b75e720 946a8efa ffffffff mymodule!FunctionC+0x7c
0b75e730 60016544 946a8ece ffffffff 092889d8 mymodule!FunctionB+0x32
0b75e754 600166b8 00842338 6000588d 00000001 myothermodule!FunctionB+0x44

From this stack, you can create a unique bucket if you take for example only your methods from the stack and concatenate them in a string: "mymodule!FunctionC+0x7c;mymodule!FunctionB+0x32;myothermodule!FunctionB+0x44". In order for this to work, you need to have access to you personal symbols server, either using the environment variable _NT_SYMBOL_PATH or with the -y command line switch. You can alternatively create a string from the return addresses only (second column): "62addc5f,0b75e7d4,62a9b436,62a52aa5,62a52b33,62a5a0f3,77cd6152,77cd6124,77cc4266,74cf28f2,62a59339,6001821c,60018042,60016544,600166b8"


Just use an MD5 string generated from the dump file and you will likely to get a unique string for every crash.


I would start with collecting the data on how often every function in your code has been "flashed" in a crash report stack trace. Every report would have to be added to some kind of database, and every function would have to be indexed so that you could later query, which functions seem to crash more often than others. (And of course, functions like main() will be in every report, but that's understandable).

Or, you think that only crash reports seem to be the problem, you could just remove all those entries from crash stack traces, and then hash the rest (your functions). That way you could see if any particular call chain of your own functions causes a crash repeatedly, no matter what external functions have been called in between.

Then of course, some of the more complicated problems will not be captured this way anyway, as the stack trace will be completely different. To help that, you could record other data from your application along with the stack trace in every report, like sizes of buffers, counters, states of different parts of the application and so on... And then do some statistics on that.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号