开发者

Profiling sorting algorithms against partially sorted data

开发者 https://www.devze.com 2023-02-13 03:02 出处:网络
We know that several sorts, such as insertion sort, are great on arrays that are \'mostly-sorted\' and not so great on random data.

We know that several sorts, such as insertion sort, are great on arrays that are 'mostly-sorted' and not so great on random data.

Suppose we wanted to profile the performance improvement/degradation of such an algorithm relative to how 'sorted' 开发者_高级运维the input data is. What would be a good way to generate an 'increasingly sorted' or 'increasingly random' array of elements? How might we measure the 'sortedness' of the input?


Number of Inversion is a usual measure of how much sorted an array is.

A pair of elements (pi,pj) in permutation p is called an inversion in a permutation if i<j and pi >pj. For example, in the permutation (3,1,2,5,4) contains the 3 inversions (3,1), (3,2) and (5,4).

A sorted array got 0 inversion and reverse sorted array got n*(n-1)/2.


You could generate a "partially sorted" dataset by interrupting a modern Fisher-Yates shuffle run on an already ordered dataset.

Also, if you only need a few essentially fixed sets of partially sorted data, then you could generate a column graph of position vs value for each and just eye-ball them. That would let you quickly see the general random-ness of a set, as well things like how much localised order there is.


Also look into creating a binary heap, and then using the array representation as your starting point. A binary heap implemented in an array is not sorted, but it is ordered. I think it would be considered "partially sorted."

0

精彩评论

暂无评论...
验证码 换一张
取 消