The kth quantiles of an n-element set are the k - 1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O(n lg k)-time algorithm to list the kth quantiles of a set.
T开发者_C百科he straight forward solution would be to select every k, 2k, 3k .. ik the smallest element, whose running time is O(kn) (k calls to select procedure of O(n)). But this can be optimized to do better than O(kn). After finding the median of medians at the index 'i' in the select procedure, we make the following recursive call.
if the index of the median of medians i is > k, recursively call select the kth smallest element in the left subarray A[0 ... i]
if the i is < k, recursively select the n - i + k th smallest element in the right subarray A[i+1 ... n].
Can the above recursive calls be modified as below which would reduce the factor 'k' to 'log k' ?
if the index of the median of medians i is > k, recursively select the kth smallest element in the left subarray A[0 ... i] and also recursively select the n - k th smallest element in the right subarray A[i+1 ... n].
if the i is < k, recursively select the n - i + k th smallest element in the right subarray A[i+1 ... n] and also recursively select k th smallest element in the left subarray A[0 ... i].
The main call would be just select(A, k, n).
Note that we use a modified PARTITION
that is given an index to the pivot to be used as its last input parameter.
You start with KTH-QUANTILES(A, 1, n, 1, k-1, k)
KTH-QUANTILES(A, p, r, i, j, k)
n=A.length
m=floor((i+j)/2)
q=floor(m(n/k))
q=q-p+1
q=SELECT(A, p, r, q)
q=PARTITION(A, p, r, q)
if i<m
L=KTH-QUANTILES(A, p, q-1, i, m-1, k)
if m<j
R=KTH-QUANTILES(A, q+1, r, m+1, j, k)
return L U A[q] U R
The depth of the recursion tree is lg k
, since the partition is done around the median of the given order statistics (from i to j).
On each level of the recursion tree there are Θ(n) operations, so the running time is Θ(nlgk).
I haven't gone through your approach, but this is a question from Int. to Algorithms by Cormen. Anyhow I was browsing for a solution myself, and I would love to share my version of the algorithm. Try to refute it's correctness:
I will assume we have a O(n) statistic finding algorithm. So I can find k-th statistic in O(n) time. Suppose I say that I will find all n/k k-th quantiles using the divide-and-conquer such that:
If I have n' elements, I split the array into n'/2 parts, report the boundary k-th quantiles for both n'/2 partitions. And report the remaining quantiles recursively. In essence what I am doing is, after partitioning using median, I will extract the rightmost quantile from left array, leftmost quantile from right partition and after trimming these arrays run the algorithm recursively. My complexity analysis comes out to be:
T(n,k) = 2*T(n/2,k/2) + O(n).
This turns out to be O(nlogk) as the k/2 part will converge faster, though you might want to solve that more rigorously. Also we have used that n>k( obvious from the problem. Note that the task of extracting 2 quantiles and trimming the array will be done in O(n)
Without loss of generality, assume that n and k are powers of 2.
We first find the n/2th order statistic, in time O(n) using SELECT, then reduce the problem to finding the k/2th quantiles of the smaller n/2 elements and the k/2th quantiles of the larger n/2 elements.
Let T(n) denote the time it takes the algorithm to run on input of size n.
Then T(n) = cn + 2T(n/2) for some constant c, and the base case is T(n/k) = O(1).
Then we have: T(n) ≤ cn + 2T(n/2) ≤ 2cn + 4T(n/4) ≤ 3cn + 8T(n/8) . . . ≤ log(k)cn + kT(n/k) ≤ log(k)cn + O(k) = O(nlogk).
cc. Solution Manual
""" kth Quantiles """
import random, math
import SelectDetermin_1 as SD
Q=[]
def kQuantiles(A,l,r,k): # O(nlgk)
global Q
if k == 1:
return
i = math.floor(k/2)
n = r-l+1
incr = math.floor(i*n/k)
p = SD.Dselect(A,l,r,l+incr,5)
Q.append((l+incr,p))
q = SD.partition(A,l,r,l+incr)
kQuantiles(A,l,q,math.floor(k/2))
kQuantiles(A,q,r,math.ceil(k/2))
k = 13
A=list(range(100))
random.shuffle(A)
kQuantiles(A,0,len(A)-1,k)
Q.sort() # O(klgk)
print([x for _,x in Q])
def BFkQuantile(A,k): #O(kn)
out = []
Q = [math.floor(i * len(A)/k + 1/2) for i in range(1,k)]
for stat in Q:
out.append(SD.Dselect(A,0,len(A)-1,stat,5))
return out
print(BFkQuantile(A,k))
Where Dselect and partition are as in Avi Cohen's post, returning the entry in A at the required index, and the index at which the list is partitioned.
I couldn't make the previous code work so I wanted to check if this is ok
精彩评论