Our tool generates performance logs in diagnostic mode however we track the performance as in code execution time (Stopwatch + miliseconds).
Obviously it's not reliable at all, the testing system's CPU can be used by some random process, results will be totally different if you the tool configured to run 10 threads rather than 2, etc.
My question is:
What's the correct way to find out correct CPU time for a piece of code (not for the whole process)?
What I mean by CPU Time:
Basically how much cycle CPU spent. I assume this will be always a same for the same piece of code in the same comp开发者_如何学Pythonuter and not effected by other processes. There might be some fundamental stuff I'm missing in here, if so please enlighten me in the comments or answers.
P.S. Using a profiler is not possible in our setup
Another update,
Why I'm not going to use profiler
Because we need to test the code in different environments with different data where we don't have a profiler or a IDE or something like that. Hence code itself should handle it. An extreme option can be using a profiler's DLL maybe but I don't think this task requires such a complex solution (assuming there is no free and easy to implement profiling library out there).
I assume this will be always a same for the same piece of code in the same computer and not effected by other processes
That's just not the way computers work. Code very much is affected by other processes running on the machine. A typical Windows machine has about 1000 active threads, you can see the number in the Performance tab of Taskmgr.exe. The vast majority of them are asleep, waiting for some kind of event signaled by Windows. Nevertheless, if the machine is running code, including yours, that is ready to go and take CPU time then Windows will give them all a slice of the pie.
Which makes measuring the amount of time taken by your code a pretty arbitrary measurement. The only thing you can estimate is the minimum amount of time taken. Which you do by running the test dozens of times, odds are decent that you'll get a sample that wasn't affected by other processes. That will however never happen in Real Life, you'd be wise to take the median value as a realistic perf measurement.
The only truely useful measurement is measuring incremental improvements to your algorithm. Change code, see how the median time changes because of that.
Basically how much cycle CPU spent. I assume this will be always a same for the same piece of code in the same computer and not effected by other processes. There might be some fundamental stuff I'm missing in here, if so please enlighten me in the comments or answers.
CPU time used by a function is a really squishy concept.
- Does it include I/O performed anywhere beneath it in the call tree?
- Is it only "self time" or inclusive of callees? (In serious code, self time is usually about zero.)
- Is it averaged over all invocations? Define "all".
- Who consumes this information, for what purpose? To find so-called "bottlenecks"? Or just some kind of regression-tracking purpose?
If the purpose is not just measurement, but to find code worth optimizing, I think a more useful concept is Percent Of Time On Stack. An easy way to collect that information is to read the function call stack at random wall-clock times (during the interval you care about). This has the properties:
- It tells inclusive time percent.
- It gives line-level (not just function-level) percent, so it pinpoints costly lines of code, whether or not they are function calls.
- It includes I/O as well as CPU time. (In serious software it is not wise to exclude I/O time, because you really don't know what's spending time several layers below in the call tree.)
- It is relatively insensitive to competition for the CPU by other processes, or to CPU speed.
- It does not require a high sample rate or a large number of samples to find costly code. (This is a common misconception.) Each additional digit of measurement precision requires roughly 100 times more samples, but that does not locate the costly code any more precisely.
A profiler that works on this principle is Zoom.
On the other hand, if the goal is simply to measure, so the user can see if changes have helped or hurt performance, then the CPU environment needs to be controlled, and simple overall time measurement is what I'd recommend.
The hands-down best way to measure CPU time is to use the instruction "rdtsc" or "Read Time Stamp Counter". This counter (part of the CPU itself) increments by the CPU's internal clock speed. So the difference between two readouts is the number of elapsed clock cycles. This counter can be integrated into your code if it (the code) is not too high-level (not quite sure though). You can measure time on disk, time on network, time in CPU etc - the possibilities are endless. If you divide the number of elapsed clock cycles with your CPU-speed in megahertz you will get the elapsed number of microseconds, for example. That's pretty good precision and better is possibleConsider building a GUI which interfaces to your CPU-usage statistics.
Search for "rdtsc" or "rdtsc" or "_rdtsc" in the help files for your environment.
精彩评论