Does anyone know any assembly loop level profiler?
I have been using gprof but gprof hides loops and it is function level profiling, yet to optimize my code i want something to go to the loop level. I want it to be automated and just give me the output like gprof. I was recommended to go to dtrace yet I have no idea were to start. anyone can direct me in anyway? for example
main:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
movl $5000000, -4(%ebp)
movl $0, -12(%ebp)
movl $0, -8(%ebp)
jmp .L2
.L3:
movl -8(%ebp), %eax
addl %eax, -12(%ebp)
addl $1, -8(%ebp)
.L2:
movl -8(%ebp), %eax
cmpl -4(%ebp), %eax 开发者_运维知识库
jl .L3
movl $0, %eax
leave ret
for example in gprof it would say main executed 1 time and foo executed 100 times. yet I want to know if L2, or L3 executed 1M times then my concentration on optimizing would be here. if my question is vague please ask me to explain more Thanks
It depends on what OS you are using, but for this kind of profiling you generally want to use a sampling profiler rather than an instrumented profiler, e.g.
- Linux: Zoom
- Mac OS X: Instruments
- Windows: VTune
I suggest using Callgrind (one of the Valgrind tools, and usually installed with it). This can gather statistics on a much more fine-grained level, and the kcachegrind tool is very good for visualising the results.
If you're on Linux, Zoom is an excellent choice.
If you're on Windows, LTProf might be able to do it.
On any platform, the low-tech method random-pausing can be relied on.
Don't look for how many times instructions are executed. Look for where the program counter is found a large fraction of the time. (They're not the same thing.) That will tell you where to concentrate your optimization efforts.
KCachegrind gives profiling information for each line of source code (see this screenshot), and this includes CPU time, cache misses, etc... It saved my day a couple of times.
However running the code inside the profiler is extremely slow (tens of times slower than native).
精彩评论