开发者

Measuring running time of computational geometry algorithms

开发者 https://www.devze.com 2023-01-07 20:59 出处:网络
I am taking a course on c开发者_如何学JAVAomputational geometry in the fall, where we will be implementing some algorithms in C or C++ and benchmarking them. Most of the students generate a few datase

I am taking a course on c开发者_如何学JAVAomputational geometry in the fall, where we will be implementing some algorithms in C or C++ and benchmarking them. Most of the students generate a few datasets and measure their programs with the time command, but I would like to be a bit more thorough.

I am thinking about writing a program to automatically generate different datasets, run my program with them and use R to test hypotheses and estimate parameters.

So... How do you measure program running time more accurately?

What might be relevant to measure?

What hypotheses might be interesting to test (variance, effects caused by caching, etc.)?

Should I test my code on more than one machine? How should these machines differ?

My overall goals are to learn how these algorithms perform in practice, which implementation techniques are better and how the hardware actually performs.


Profilers are great. Valgrind is pretty popular. Also, I'd suggest trying your code out on risc machines if you can get access to some. Their performance characteristics are different from those of cisc machines in interesting ways.


You could use the Windows API timing function (are not that exactly) and you can use the RDTSC inline assembler command which is sub-nanosecond exact(don't forget that the command and the instructions around it create a small overhead of some hundreds cycles but this is not an big issue).


In order to get better accuracy with program metrics, you will have to run your program many times, such as 100 or 1000.

For more details, on metrics, search the web for metrics and profiling.

Beware that programs may differ in performance (time) measurements due to things running in the background such as virus scanners, music players, and other programs with timers in them.

You could test your program on different machines. Processor clock rates, L1 and L2 cache sizes, RAM sizes, and Disk speeds are all factors (as well as the number of other programs / tasks running concurrently). Floating point may also be a factor.

If you want, you can challenge your compiler by printing the assembly language of the listings for various optimization settings. See which setting produces the fewest or most efficient assembly code.

Since your processing data, look at data driven design: http://www.gamearchitect.net/Articles/DataDrivenDesign.html


You can use the Windows High Performance Counter to get nanosecond accuracy. Technically, afaik, the HPC can be any speed, but you can query it's counts per second, and as far as I know, most CPUs do very very high performance counting.

What you should do is just get a professional profiler. That's what they're for. More realistically, however.

If you're only comparing between algorithms, as long as your machine doesn't happen to excel in one area (Pentium D, SSD sort of thing) it shouldn't matter too much to do it on just one machine. If you want to look at cache effects, try running the algorithm right after the machine starts up (make sure that you get a copy of Windows 7, should be free for CS students), then leave it doing something that can be plenty cache heavy, like image processing, for 24h or something to convince the OS to cache it. Then run algorithm again. Compare.


You didn't specify your platform. If you are on a POSIX system (eg linux) have a look into clock_gettime. This lets you access different kinds of clocks e.g wall clock time or cpu time. You also may get to know about the precision of the clocks.

Since you are willing to do good statistics on your numbers, you should repeat your experiments often enough such that the statistical test give you enough confidence.

If your measurements are not too fine grained and your variance is low this often is quite good for 10 probes or so. But if you go down to small scale, a short function or so, you might need to go much higher.

Also you would have to ensure reproducible experimental conditions, no other load on the machine, enough memory available etc.

0

精彩评论

暂无评论...
验证码 换一张
取 消