I found 开发者_开发技巧two great profilers:
- OProfile
- Google Performance Tools
Have anyone tried them both? Which is better?
oprofile is more accurate; it uses CPU performance monitoring (built in hardware monitoring with 100s of performance events);
and google-perftools libprofiler.so uses setitimer
- intreval timer of OS kernel:
$ nm -D libprofiler.so | grep timer
U getitimer
U setitimer
Interval timer is emulated by OS and it can't be more than HZ
, as I know (100 times per second or 1000 times per second or 300 or 250). I just tried both 10000 and 100000, but the effective rate was 1000 (run time of program is 2 seconds and there is only ~2000 samples collected by cpu profiler from google). This is my HZ:
$ zgrep HZ= /proc/config.gz
CONFIG_HZ=1000
Don't know how this will work on tickless kernel.
In turn, oprofile uses special hardware in CPU and this hardware is accurate up to several ticks. It can measure, where your program is, at every 100000th or 1000000th ticks of CPU and this value is not tied to OS HZ setting. Also it can profile not only on every N-th tick of CPU, but also on every N-th L2 cache miss or every N-th jmp
misprediction and so on... there are hundreds of hardware performance events in any CPU after Pentium Pro.
The other better side of oprofile is that it can profile anything, any user application or all user application or kernel and every application.
But oprofile requires root to use it (AFAIK), it can freeze you system with wrong usage; it needs to be enabled in kernel (when kernel was built).
Better side of google-perftools is: easy to use; good graphing and analysing capabilities; need no root to work. Also, there is a good heap profiler in google-perftools.
Both oprofile and google-perftools/cpuprofiler:
- needs no recompilation of application (like it was needed with gprof/gcov)
- can draw a partial callgraph (like it is done in kcachegind, e.g. 1 2; pprof even can draw via kcachegrind with
callgrind
command ) - will measure real profile (not an emulated one like in kcachegrind/callgrind/ other valgrind-based)
- portable (oprofile needs support from and for CPU, but it is here for intel/amd/via/ many arms; perftools will work anywhere where it can get callstack and ask OS to install setitimer)
精彩评论