开发者

How can I prove that one algorithm is faster than another in Java

开发者 https://www.devze.com 2023-03-25 00:41 出处:网络
Is there anything in Java that would allow me to take a code snippit and allow me to see exactly how many \"ticks\" it takes to execute.I want to prove that an algorithm开发者_C百科 I wrote is faster

Is there anything in Java that would allow me to take a code snippit and allow me to see exactly how many "ticks" it takes to execute. I want to prove that an algorithm开发者_C百科 I wrote is faster than another.


"Ticks"? No. I'd recommend that you run them several times each and compare the average results:

public class AlgorithmDriver {
    public static void main(String [] args) {
        int numTries = 1000000;
        long begTime = System.currentTimeMillis();
        for (int i = 0; i < numTries; ++i) {
            Algorithm.someMethodCall();
        }
        long endTime = System.currentTimeMillis();
        System.out.printf("Total time for %10d tries: %d ms\n", numTries, (endTime-begTime));
    }
}


You probably are asking two different questions:

  1. How can you measure the run time of a java implementation (Benchmarking)
  2. How can you prove the asymptotic run time of an algorithm

For the first of these I wouldn't use the solutions posted here. They are mostly not quite right. Forst, its probably better to use System.nanoTime than System.currentTimeMillis. Second, you need to use a try catch block. Third, take statistic of running times of your code running many times outside of the metric, so that you can have a more complete picture. Run code that looks vaguely like this many times:

long totalTime = 0;
long startTime = System.nanoTime();
try{
   //method to test
} finally {
   totalTime = System.nanoTime() - startTime;
}

Getting benchmarking correct is hard. For example, you must let your code "warmup"" for a few minutes before testing it. Benchmark early and often, but dont over believe your benchmarks. Particularly small micro benchmarks almost always lie in one way or another.

The second way to interpret your question is about asymptotic run times. The truth is this has almost nothing to do with Java, it is general computer science. Here the question we want to ask is: what curves describe the behavior of the run time of our algorithm in terms of the input size.

The first thing is to understand Big-Oh notation. I'll do my best, but SO doesn't support math notation. O(f(n)) denotes a set of algorithms such that in the limit as n goes to infinity f(n) is within a constant factor of an upper bound on the algorithm run time. Formally, T(n) is in O(f(n)) iff there exists some constant n0 and some constant c such that for all n > n0 c*f(n) >= n. Big Omega is the same thing, except for upper bounds, and big Theta f(n) just means its both big Oh f(n) and big Omega f(n). This is not two hard.

Well, it gets a little more complicated because we can talk about different kinds of run time, ie "average case", best case, and worst case. For example, normall quicksort is O(n^2) in the worst case, but O(n log n) for random lists.

So I skipped over what T(n) means. Basically it is the number of "ticks." Some machine instructions (like reading from memory) take much longer than others (like adding). But, so long as they are only a constant factor apart from each other, we can treat them all as the same for the purposes of big Oh, since it will just change the value of c.

Proving asymptotic bounds isn't that hard. For simple structured programming problems you just count

public int square(int n){
   int sum = 0
   for(int i = 0, i < n, i++){
     sum += n
   }
   return sum
}

In this example we have one instruction each for: initializing sum, initializing i, and returning the value. The loop happens n times and on each time we do a comparison, and addition, and an increment. So we have O(square(n)) = O(3 + 3n) using n0 of 2 and c of 4 we can easily prove this is in O(n). You can always safely simplify big Oh expressions by removing excess constant terms, and by dividing by constant multiples.

When you are faced with a recursive function you have to solve a recurrence relation. If you have a function like T(n) = 2*T(n/2) + O(1) you want to find a closed form solution. You sometimes have to do this by hand, or with a computer algebra system. For this example, using forward substitution, we can see the pattern (in an abuse of notation) T(1) = O(1), T(2) = O(3), T(4) = O(7), T(8) = (15) this looks alot like O(2n - 1), to prove this is the right value:

 T(n) = 2*T(n/2) + 1
 T(n) = 2*(2(n/2) - 1) + 1
 T(n) = 2*(n-1) + 1
 T(n) = 2n - 2 + 1
 T(n) = 2n - 1

As we saw earlier you can simplify O(2n -1) to O(n)

More often though you can use the master theorem which is a mathematical tool for saving you time on this kind of problem. If you check wikipedia you can find the master theorem, which if you plug and play the example above you get the same answer.

For more, check out an algorithms text book like Levitin's "The Design & Analysis of Algorithms"


You could use System.currentTimeMillis() to get start and end times.

long start = System.currentTimeMillis();

// your code

long end = System.currentTimeMillis();
System.out.println( "time: " + (end - start) );


You can measure wall time with System.currentTimeMillis() or System.nanoTime() (which have different characteristics). This is relatively easy as you just have to print out the differences at the end.

If you need to count specific operations (which is common in algorithms), the easiest is to simply increment a counter when the operations are being done , and then print it when you are done. long is well suited for this. For multiple operations use multiple counters.


I had to do this algorithm efficiency proofs mostly on my Data Structures lesson this year.

First,I measured the time like they mentioned upper. Then I increased the method's input number with squaring each time(10,100,1000,...) Lastly,I put the time measurements in an Excel file and drawed graphics for these time values.

By this way,you can check if one algorithm is faster than other or not,slightly.


I would:

  • Come up with a few data sets for the current algorithm: a set where it performs well, a set where it performs ok, and a data set where it performs poorly. You want to show that your new algorithm outperforms the current one for each scenario.
  • Run and measure the performance of each algorithm multiple times for increasing input sizes of each of the three data sets, then take average, standard deviation etc. Standard deviation will show a crude measure of the consistency of the algorithm performance.
  • Finally look at the numbers and decide in your case what is important: which algorithm's performance is more suitable for the type of input you will have most of the time, and how does it degrade when the inputs are not optimal.

Timing the algorithm is not necessarily everything - would memory footprint be important as well? One algorithm might be better computationally but it might create more objects while it runs.. etc. Just trying to point out there is more to consider than purely timing!


I wouldn't use the current time in ms as some of the others have suggested. The methods provided by ThreadMXBeans are more accurate (I dare not say 100% accurate).

They actually measure the cpu time taken by the thread, rather then elapsed system time, which may be skewed due to context switches performed by the underlying OS.

Java Performance Testing


I am not too familiar with the Java Framework but i would do it the following way:

  1. Define a set of test cases (mostly example data) that can be used for both algorithms
  2. Implement a timing method to measure the amount of time that a specific function takes
  3. Create a for loop and execute method A (repeatedly, e.g. 1000 times with the whole test data). Measure the timing of the loop, not the sum of the single functions since timing functions can bias your result when called a lot)
  4. Do the same for method B
  5. Compare your result and choose a winner


If both algorithms have the same definition of a macro-level "tick" (e.g. walking one node in a tree) and your goal is to prove that your algorithm accomplishes its goal in a lower number of those macro-level ticks than the other, then by far the best way is to just instrument each implementation to count those ticks. That approach is ideal because it doesn't reward low-level implementation tricks that can make code execute faster but are not algorithm-related.

If you don't have that luxury, but you are trying to calculate which approach solves the problem using the least amount of CPU resources, contrary to the approaches listed here involving System.currentTimeMillis etc, I would use an external approach: the linux time command would be ideal. You have each program run on the same set of (large) inputs, preferably that take on the order of minutes or hours to process, and just run time java algo1 vs time java algo2.


If your purpose is compare the performances between two pieces of code, the best way to do is using JMH. You can import via maven and is now official in openjdk 12.

https://openjdk.java.net/projects/code-tools/jmh/

0

精彩评论

暂无评论...
验证码 换一张
取 消