How to quickly debug when something wrong in code workflow?_问答_开发者

I have frequently encounter the following debugging scenario:

Tester provide some reproduce steps for a bug. And to find out where the problem is, I try to play with these reproduce steps to get the minimum necessary reproduce steps. Sometimes, luckily I found that when do a minor change to the steps, the problem is gone.

Then the job turns to find the difference in code workflow between these two reproduce steps. This job is tedious and painful especially when you are working on a large code base and it go through a lot code and involve lots of state changes which you are not familiar with.

So I was wondering is there any tools available to compare "code workflow". As I've learned the "wt" command in WinDbg, I thought it might be possible to do it. For example, I can run the "wt" command on some out most functions with 2 different reproduce steps and then compare the difference between outputs. Then it should be easy to found where the code flow starts to diverge.

But the problem with WinDBG is "wt" is quite slow (maybe I should use a log file instead of output to screen) and not very user-friendly (compared with visual studio debugger) ... So I want to ask you guys开发者_运维问答 is there any existing tools available . or is it possible and difficult to develop a "plug-in" for visual studio debugger to support this functionality ?

Thanks

I'd run it under a profiler in "coverage" mode, then use diff on the results to see which parts of the code were executed in one run by not the other.

Sorry, I don't know of a tool which can do what you want, but even if it existed it doesn't sound like the quickest approach to finding out where the lower layer code is failing.

I would recommend to instrument your layer's code with high-level logs so you can know which module fails, stalls, etc. In debug, your logger can write to file, to output debug window, etc.

In general, failing fast and using exceptions are good ways to find out easily where things go bad.

Doing something after the fact is not going to cut it, since your problem is reproducing it.

The issue with bugs is seldom some interal wackiness but usually what the user's actually doing. If you log all the commands that the user enters then they can simply send you the log. You can substitute button clicks, mouse selects, etc. This will have some cost but certainly much less than something that keeps track of every method visited.

I am assuming that if you have a large application that you have good logging or tracing.

I work on a large server product with over 40 processes and over one million lines of code. Most of the time the error in the trace file is enough to identify the location of problem. However sometimes the error I see in the trace file is caused by some earlier code and the reason for this can be hard to spot. Then I use a comparative debugging technique:

Reproduce the first scenario, copy the trace to a new file (if the application is multi threaded ensure you only have the trace for the thread that does the work).
Reproduce the second scenario, copy the trace to a new file.
Remove the timestamps from the log files (I use awk or sed for this).
Compare the log files with winmerge or similar, to see where and how they diverge.

This technique can be a little time consuming, but is much quicker than stepping through thousand of lines in the debugger.

Another useful technique is producing uml sequence diagrams from trace files. For this you need the function entry and exit positions logged consistently. Then write a small script to parse your trace files and use sequence.jar to produce uml diagrams as png files. This is a great way to understand the logic of code you haven't touched in a while. I wrapped a small awk script in a batch file, I just provide trace file and line number to start then it untangles the threads and generates the input text to sequence.jar then runs its to create the uml diagram.