开发者

Why Arrays.sort is quicksort algorithm, why not another sort algorithm?

开发者 https://www.devze.com 2023-01-27 16:11 出处:网络
Why? Is it faster or more efficient?开发者_运维知识库 For systems with one core, we can use quicksort. What should we use on systems with two cores, four cores, or eight cores?Quicksort has the advan

Why? Is it faster or more efficient?开发者_运维知识库

For systems with one core, we can use quicksort. What should we use on systems with two cores, four cores, or eight cores?


Quicksort has the advantage of being completely in place, so it does not require any additional storage, while mergesort (which is actually used by Arrays.sort() for object arrays) and other (all?) guaranteed O(n*log n) algorithm require at least one full copy of the array. For programs that sort very large primitive arrays, that means potentially doubling the overall memory usage.


The answer is in Jon L. Bentley and M. Douglas McIlroy’s “Engineering a Sort Function”, which the sort function cites.

Shopping around for a better qsort, we found that a qsort written at Berkeley in 1983 would consume quadratic time on arrays that contain a few elements repeated many times—in particular arrays of random zeros and ones. In fact, among a dozen different Unix libraries we found no qsort that could not easily be driven to quadratic behavior; all were derived from the Seventh Edition or from the 1983 Berkeley function.…

Unable to find a good enough qsort, we set out to build a better one. The algorithm should avoid extreme slowdowns on reasonable inputs, and should be fast on ‘random’ inputs. It should also be efficient in data space and code space. The sort need not be stable; its specification does not promise to preserve the order of equal elements.

The alternatives were heapsort and mergesort, since Java was created in the early 1990s. Mergesort is less desirable because it requires extra storage space. Heapsort has a better worst-case performance (O(n log n) compared to O(n^2)), but performs more slowly in practice. Thus, if you can control the worst case performance via good heuristics, a tuned quicksort is the way to go.

Java 7 is switching to Timsort, which was invented in 1993 (implemented in Python in 2002) and has a worst-case performance of O(n log n) and is a stable sort.


Quicksort has O(n log n) average and O(n^2) worst case performance, that is the best "average case" a sort algorithm can be, there are other sort algorithms that have this performance, but quicksort tends to perform better than most.

See: http://en.wikipedia.org/wiki/Quicksort


It is a tuned quicksort. If you are really interested you can read the material mentioned in the documentation.

The sorting algorithm is a tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy's "Engineering a Sort Function", Software-Practice and Experience, Vol. 23(11) P. 1249-1265 (November 1993).

And here is a bit of an explanation - the tuned version gives n*log(n) on many data sets:

This algorithm offers n*log(n) performance on many data sets that cause other quicksorts to degrade to quadratic performance


Compared with Quicksort, Mergesort has less number of comparisons but larger number of moving elements.

In Java, an element comparison is expensive but moving elements is cheap. Therefore, Mergesort is used in the standard Java library for generic sorting

In C++, copying objects can be expensive while comparing objects often is relatively cheap. Therefore, quicksort is the sorting routine commonly used in C++ libraries.

ref: http://www.cs.txstate.edu/~rp44/cs3358_092/Lectures/qsort.ppt


Arrays.sort() uses multiple sorting algorithms depending on the size and elements in the array.

  • Insertion sort for small arrays
  • Merge sort for mostly sorted arrays
  • A highly tuned and adaptable dual-pivot & single pivot quicksort for everything else

So in practice we see that quicksort is very fast for large arrays of primitives but has some pitfalls when it needs to adapt to partially sorted arrays, when comparisons between objects are slow, for stable sorting and more.


Since its been a while since last answer on this thread, here is some updates...

It depend on the complexity and its relevancy to size of array plus probability when java researched these algorithms and just decided depending on Measurements and Benchmarks.

According to JAVA JDK 1.8 DOCS its self explanatory where it chooses the algorithm not only one but up to four to choose from according to some thresholds...

/**
     * If the length of an array to be sorted is less than this
     * constant, Quicksort is used in preference to merge sort.
     */
    private static final int QUICKSORT_THRESHOLD = 286;

    /**
     * If the length of an array to be sorted is less than this
     * constant, insertion sort is used in preference to Quicksort.
     */
    private static final int INSERTION_SORT_THRESHOLD = 47;

    /**
     * If the length of a byte array to be sorted is greater than this
     * constant, counting sort is used in preference to insertion sort.
     */
    private static final int COUNTING_SORT_THRESHOLD_FOR_BYTE = 29;

    /**
     * If the length of a short or char array to be sorted is greater
     * than this constant, counting sort is used in preference to Quicksort.
     */
    private static final int COUNTING_SORT_THRESHOLD_FOR_SHORT_OR_CHAR = 3200;

Reference Java DOC JDK 8

It event evolved to use parallel Sorting Sorting in Java

Java 8 comes with a new API – parallelSort – with a similar signature to the Arrays.sort() API:

@Test
public void givenIntArray_whenUsingParallelSort_thenArraySorted() {
    Arrays.parallelSort(toSort);

    assertTrue(Arrays.equals(toSort, sortedInts));
}

Behind the scenes of parallelSort(), it breaks the array into different sub-arrays (as per granularity in the algorithm of parallelSort). Each sub-array is sorted with Arrays.sort() in different threads so that sort can be executed in parallel fashion and are merged finally as a sorted array.

Note that the ForJoin common pool is used for executing these parallel tasks and then merging the results.

The result of the Arrays.parallelSort is going to be same as Array.sort of course, it’s just a matter of leveraging multi-threading.

Finally, there are similar variants of API Arrays.sort in Arrays.parallelSort as well:

Arrays.parallelSort (int [] a, int fromIndex, int toIndex);

Summary: So as the Java API evolves with the HardWare and software in general there is more use for the multi threading and tuning here and there on the thresholds and algorithims.


First of all Arrays.sort doesn't only use quick sort , It uses multiple algorithms java1.6 onwards

See below code from Arrays class

/** * Sorts the specified array into ascending numerical order. * *

Implementation note: The sorting algorithm is a Dual-Pivot Quicksort * by Vladimir Yaroslavskiy, Jon Bentley, and Joshua Bloch. This algorithm * offers O(n log(n)) performance on many data sets that cause other * quicksorts to degrade to quadratic performance, and is typically * faster than traditional (one-pivot) Quicksort implementations. * * @param a the array to be sorted */ public static void sort(int[] a) { DualPivotQuicksort.sort(a); }

DualPivotQuicksort.sort(a); // This uses 5 algorithms internally depending upon dataset size 
do checkout the source code of Arrays class.

Before java 1.6 I think it was using three algorithm quick sort for primitive types such as int and mergesort for objects and when quick sort out performs it start heap sort, See here for more details http://cafe.elharo.com/programming/java-programming/why-java-util-arrays-uses-two-sorting-algorithms


Quicksort is fastest on average O(n log(n)), so Sun probably used that as a good metric.


QuickSort is a common sorting algorithm. It's reasonably fast, except when the data to be sorted is already in inverse order. It's also efficient in space.


It depends on what you want to do. The problem with a normal quicksort is, that it can sometimes be in O(n²). So normaly you could use heap sort, but most times quick sort is faster.

However the Arrays.sort(...) implementation uses a "tuned tuned quicksort, adapted from Jon L. Bentley and M. Douglas McIlroy [...]" (according to the JavaDoc documentation). This algorithm has has some build in optimizations, that enables it to work on O(n*log(n)), where a normal quicksort would use O(n²).

Also the Arrays.sort algorithm is tested over and over again and you can be sure that it works and is bugfree (although this can't be guaranteed.)

iuiz


Arrays.sort() does not use quick-sort. Java 7 used TimSort which is a combination of Merge Sort and Insertion Sort. Java 8 uses parallel-sort when there are more number of elements, and uses multiple threads for sorting. Else it uses TimSort.

Thus the worst case time complexity is always O(nlogn)

0

精彩评论

暂无评论...
验证码 换一张
取 消