I was given the following question in an algorithms book:
Suppose a merge sort is implemented to split a file at a random position, rather then exactly in the middle. How开发者_开发百科 many comparisons would be used by such method to sort n elements on average?
Thanks.
To guide you to the answer, consider these more specific questions:
Assume the split is always at 10%, or 25%, or 75%, or 90%. In each case: what's the impact on recursion depths? How many comparisons need to be per recursion level?
I'm partially agree with @Armen, they should be comparable.
But: consider the case when they are split in the middle. To merge two lists of lengths n
we would need 2*n - 1
comparations (sometimes less, but we'll consider it fixed for simplicity), each of them producing the next value. There would be log2(n)
levels of merges, that gives us approximately n*log2(n)
comparations.
Now considering the random-split case: The maximum number of comparations needed to merge a list of length n1
with one of length n2
will be n1 + n2 - 1
. Howerer, the average number will be close to it, because even for the most unhappy split 1
and n-1
we'll need an average of n/2
comparations. So we can consider that the cost of merging per level will be the same as in even case.
The difference is that in random case the number of levels will be larger, and we can consider that n
for next level would be max(n1, n2)
instead of n/2
. This max(n1, n2)
will tend to be 3*n/4
, that gives us the approximate formula
n*log43(n) // where log43 is log in base 4/3
that gives us
n * log2(n) / log2(4/3) ~= 2.4 * n * log2(n)
This result is still larger than the correct one because we ignored that the small list will have fewer levels, but it should be close enough. I suppose that the correct answer will be the number of comparations on average will double
You can get an upper bound of 2n * H_{n - 1} <= 2n ln n using the fact that merging two lists of total length n costs at most n comparisons. The analysis is similar to that of randomized quicksort (see http://www.cs.cmu.edu/afs/cs/academic/class/15451-s07/www/lecture_notes/lect0123.pdf).
First, suppose we split a list of length n into 2 lists L and R. We will charge the first element of R for a comparison against all of the elements of L, and the last element of L for a comparison against all elements of R. Even though these may not be the exact comparisons that are executed, the total number of comparisons we are charging for is n as required.
This handles one level of recursion, but what about the rest? We proceed by concentrating only on the "right-to-left" comparisons that occur between the first element of R and every element of L at all levels of recursion (by symmetry, this will be half the actual expected total). The probability that the jth element is compared to the ith element is 1/(j - i) where j > i. To see this, note that element j is compared with element i exactly when it is the first element chosen as a "splitting element" from among the set {i + 1,..., j}. That is, elements i and j are split into two lists as soon as the list they are in is split at some element from {i + 1,..., j}, and element j is charged for a comparison with i exactly when element j is the element that is chosen from this set.
Thus, the expected total number of comparisons involving j is at most H_n (i.e., 1 + 1/2 + 1/3..., where the number of terms is at most n - 1). Summing across all possible j gives n * H_{n - 1}. This only counted "right-to-left" comparisons, so the final upper bound is 2n * H_{n - 1}.
精彩评论