开发者

Find the minimum number of elements required so that their sum equals or exceeds S

开发者 https://www.devze.com 2023-03-04 21:46 出处:网络
I know this can be done by sorting the array and taking the larger numb开发者_C百科ers until the required condition is met. That would take at least nlog(n) sorting time.

I know this can be done by sorting the array and taking the larger numb开发者_C百科ers until the required condition is met. That would take at least nlog(n) sorting time.

Is there any improvement over nlog(n).

We can assume all numbers are positive.


Here is an algorithm that is O(n + size(smallest subset) * log(n)). If the smallest subset is much smaller than the array, this will be O(n).

Read http://en.wikipedia.org/wiki/Heap_%28data_structure%29 if my description of the algorithm is unclear (it is light on details, but the details are all there).

  1. Turn the array into a heap arranged such that the biggest element is available in time O(n).
  2. Repeatedly extract the biggest element from the heap until their sum is large enough. This takes O(size(smallest subset) * log(n)).

This is almost certainly the answer they were hoping for, though not getting it shouldn't be a deal breaker.

Edit: Here is another variant that is often faster, but can be slower.

Walk through elements, until the sum of the first few exceeds S.  Store current_sum.
Copy those elements into an array.
Heapify that array such that the minimum is easy to find, remember the minimum.
For each remaining element in the main array:
    if min(in our heap) < element:
        insert element into heap
        increase current_sum by element
        while S + min(in our heap) < current_sum:
            current_sum -= min(in our heap)
            remove min from heap

If we get to reject most of the array without manipulating our heap, this can be up to twice as fast as the previous solution. But it is also possible to be slower, such as when the last element in the array happens to be bigger than S.


Assuming the numbers are integers, you can improve upon the usual n lg(n) complexity of sorting because in this case we have the extra information that the values are between 0 and S (for our purposes, integers larger than S are the same as S).

Because the range of values is finite, you can use a non-comparative sorting algorithm such as Pigeonhole Sort or Radix Sort to go below n lg(n).

Note that these methods are dependent on some function of S, so if S gets large enough (and n stays small enough) you may be better off reverting to a comparative sort.


Here is an O(n) expected time solution to the problem. It's somewhat like Moron's idea but we don't throw out the work that our selection algorithm did in each step, and we start trying from an item potentially in the middle rather than using the repeated doubling approach.

Alternatively, It's really just quickselect with a little additional book keeping for the remaining sum.

First, it's clear that if you had the elements in sorted order, you could just pick the largest items first until you exceed the desired sum. Our solution is going to be like that, except we'll try as hard as we can to not to discover ordering information, because sorting is slow.

You want to be able to determine if a given value is the cut off. If we include that value and everything greater than it, we meet or exceed S, but when we remove it, then we are below S, then we are golden.

Here is the psuedo code, I didn't test it for edge cases, but this gets the idea across.

def Solve(arr, s):
  # We could get rid of worse case O(n^2) behavior that basically never happens 
  # by selecting the median here deterministically, but in practice, the constant
  # factor on the algorithm will be much worse.
  p = random_element(arr)
  left_arr, right_arr = partition(arr, p)
  # assume p is in neither left_arr nor right_arr
  right_sum = sum(right_arr)
  if right_sum + p >= s:
    if right_sum < s:
      # solved it, p forms the cut off
      return len(right_arr) + 1    
    # took too much, at least we eliminated left_arr and p
    return Solve(right_arr, s) 
  else:
    # didn't take enough yet, include all elements from and eliminate right_arr and p
    return len(right_arr) + 1 + Solve(left_arr, s - right_sum - p)  


One improvement (asymptotically) over Theta(nlogn) you can do is to get an O(n log K) time algorithm, where K is the required minimum number of elements.

Thus if K is constant, or say log n, this is better (asymptotically) than sorting. Of course if K is n^epsilon, then this is not better than Theta(n logn).

The way to do this is to use selection algorithms, which can tell you the ith largest element in O(n) time.

Now do a binary search for K, starting with i=1 (the largest) and doubling i etc at each turn.

You find the ith largest, and find the sum of the i largest elements and check if it is greater than S or not.

This way, you would run O(log K) runs of the selection algorithm (which is O(n)) for a total running time of O(n log K).


  1. eliminate numbers < S, if you find some number ==S, then solved
  2. pigeon-hole sort the numbers < S

Sum elements highest to lowest in the sorted order till you exceed S.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号