Possible Duplicate:
What's your favorite profiling tool (for C++)
Are there any good tools to profile a source code which is mix of of C and C++. What are the pros and cons of any, and which ones have you used and would reccomend for usage. Please do not get me a list of tools from google. I can do that开发者_运维百科 too, what i want is to leverage the personal experience of someone who has used these tools and knows the pros and cons about them.
Thanks in advance.I've found gprof to be the best CPU hotspot profiler, and Google Performance Tools to be the best sampling profiler. Both work for C and C++.
In my opinion there are no good profiling tools on Windows.
GNU gprof pros and cons
- GCC only
- Works with C and C++
- Only treats CPU time, and code inside the binary, you need everything you wish to profile statically linked in
- Very accurate
- Adds a small overhead to execution
Google Performance Tools pros and cons
- I think it requires the GNU tool chain
- Occasionally fails to identify symbols
- Very customizable
- Outputs to a huge variety of formats, including the Callgrind format, and automatically loads KCacheGrind for you
- Has various memory profiling tools also
- Is a sampling profiler, with minimal overhead
Related useful questions and answers
- Alternative to -pg with Clang?
- What's your favorite profiling tool (for C++)
- Alternatives to gprof
- C++ Code Profiler
- Confusing gprof output
I would respectfully disagree with Matt.
The tool I use all the time on Windows is the random-pausing technique, and it works with all languages that the IDE supports.
As an example of using it to do performance tuning, this case shows how a speedup of 43 times was achieved through a series of steps.
Gprof has a lot of problems, listed here, and according to the google-perftools manual, some of the same issues are repeated there, such as reporting procedures, not lines, emphasizing self (local) time, emphasizing the graph, etc. (I can't tell from the doc if it samples while blocked.)
As software systems become ever larger, self time becomes less and less relevant. The program counter spends most of its time in library routines or blocked in the system. Graphs become gigantic nests. People ask "I know function X is costly, but where in function X is the problem?" What's more, the "bottlenecks" get bigger and bigger, because the stack gets deeper on average, and every layer of the stack is a fresh opportunity to do more function calls than necessary.
An example of a stack-sampler that reports percent by line, and samples while blocked, and allows user control of sampling so as not to dilute the sample set during user input, is Zoom.
EDIT: Sorry, can't leave well enough alone. Here's a new explanation:
The way programs work, they trace out a call tree, which is a lot like the oak tree outside my window. It has a trunk (main) which sprouts branches (call sites) which sprout further branches for several levels out to leaves (instructions) and acorns (blocking calls).
When the tree surgeon comes to prune (optimize) it, does he look only where the leaves are (hotspots)? Does he ignore acorns (no samples during blocking)? No, he looks for branches (call sites) that are both heavy (on the stack a lot) and unhealthy (unnecessary). Those are what he prunes. That's what random-pausing and Zoom do, is help find those call sites.
You can use Callgrind to create profiling output. It is part of Valgrind. Callgrind-output could be used with KCacheGrind, which is probably worth a look as long as you're using Linux.
AMD CodeAnalyst is pretty nice. It's also cross platform which is nice when one finds a platform specific bottleneck.
精彩评论