开发者

Is it possible to split a sequence of numbers into two groups based in median value without sorting?

开发者 https://www.devze.com 2023-01-03 16:35 出处:网络
Is there an algorithm开发者_高级运维 to split a sequence of random numbers into two groups based on a median value determined on the fly(without sorting them)?

Is there an algorithm开发者_高级运维 to split a sequence of random numbers into two groups based on a median value determined on the fly(without sorting them)?

Ex. If I have the sequence 2-3-6-7-1-4-5, the result would be two separated groups:

A) 1 2 3

B) 5 6 7

Median value: 4


You can find the median of an array (and split) in linear time.


Yes, this can be done in O(n).

First of all, if we already knew the median, we could easily split the sequence in two in O(n) by iterating the sequence and comparing each value with the median. So how do we find the median in O(n)?

The basic idea is to use quicksort, but instead of recursively sorting both sides of the pivot, only sort the half that contains the median (ie. the half that encompasses the index ⌈n/2⌉). If our selection of a pivot guarantees geometric convergence of quicksort (like median-of-medians does), then our overall algorithm will be O(n).

Algorithm

Let's call the current size of our array k, and the reduction due to median-of-medians c - ie. our pivot guarantees the array shrinks by a factor of at least c each step

  1. Estimate the median of the array using median-of-medians - O(k)
  2. Partition the array quicksort-style (with the estimate as our pivot) - O(k)
  3. Choose the half of the array containing the median (index ⌈n/2⌉). This new sub-array will have size no greater than k/c. Repeat steps 1 & 2 recursively until we've determined the element whose position in the original array is ⌈n/2⌉.

The asymptotic running time of this algorithm is

2 O(n) + 2 O(n/c) + 2 O(n/c2) + 2 O(n/c3) + ...
= O(n)


The BFPRT (Blum-Floyd-Pratt-Rivest-Tarjan)-Algorithm (look at wiki) can find the median in linear time, i.e. in O(n).

However the constant "hidden" in the O-notation is so large that for practice it is faster to sort the array in O(n log n) for reasonable array sizes.


You can find the median by finding the average between the floor(n/2)th largest item and the floor(n/2)th smallest item. This can be done with help of this previous SO question.

After that, simply iterate through your array, putting elements greater than the median into one and lower than the median into the other.


Alternatively, if you knew the size of your sequence, you could create two collections of size floor(n/2): one "smallest half" (S) and one "largest half" (L), and then one by one by one:

  • Take out one element in your sequence, call it e.
  • Put it into S if S is not full.
  • If S is full, find the largest element of (S | e) (the union of the two) (this can be impelemented by iterating through S until an element larger than e is found; if none is found, it is e, else, it is the found element), and add it to L. If this largest was in S, put e in S to re-fill it.
  • If L is full, find the smallest element of (L | e) and remove it, adding e into L if e was not removed.

I believe this is O(n) time; someone correct me if I'm wrong. The worst case scenario I could imagine is the original sequence being sorted in descending order.

ruby implementation (with much un-performancy shortcuts):

def split_into_halves to_split
  s = []
  l = []
  medianlimit = to_split.size/2
  for e in to_split
    if s.size < medianlimit
      s.push(e)
    else

      if s.max >= n
        max = s.max
        s.delete max
        s.push(e)
      else
        max = e
      end

      if l.size < medianlimit
        l.push(max)
      elsif l.max >= max
        l.delete l.max
        l.push(max)
      end

    end
  end

  return [s,l]
end

k = [2,3,6,7,1,4,5]
split_into_halves(k) #=> [[2,3,1],[6,4,5]]
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号