开发者

Given 2 sorted arrays of integers, find the nth largest number in sublinear time [duplicate]

开发者 https://www.devze.com 2023-02-03 23:12 出处:网络
This question already has answers here: Closed 12 years ago. 开发者_如何学CPossible Duplicate: How to find the kth smallest element in the union of two sorted arrays?
This question already has answers here: Closed 12 years ago.

开发者_如何学CPossible Duplicate:

How to find the kth smallest element in the union of two sorted arrays?

This is a question one of my friends told me he was asked while interviewing, I've been thinking about a solution.

Sublinear time implies logarithmic to me, so perhaps some kind of divide and conquer method. For simplicity, let's say both arrays are the same size and that all elements are unique


I think this is two concurrent binary searches on the subarrays A[0..n-1] and B[0..n-1], which is O(log n).

  • Given sorted arrays, you know that the nth largest will appear somewhere before or at A[n-1] if it is in array A, or B[n-1] if it is in array B
  • Consider item at index a in A and item at index b in B.
  • Perform binary search as follows (pretty rough pseudocode, not taking in account 'one-off' problems):
    • If a + b > n, then reduce the search set
      • if A[a] > B[b] then b = b / 2, else a = a / 2
    • If a + b < n, then increase the search set
      • if A[a] > B[b] then b = 3/2 * b, else a = 3/2 * a (halfway between a and previous a)
    • If a + b = n then the nth largest is max(A[a], B[b])

I believe worst case O(ln n), but in any case definitely sublinear.


I believe that you can solve this problem using a variant on binary search. The intuition behind this algorithm is as follows. Let the two arrays be A and B and let's assume for the sake of simplicity that they're the same size (this isn't necessary, as you'll see). For each array, we can construct parallel arrays Ac and Bc such that for each index i, Ac[i] is the number of elements in the two arrays that are no larger than A[i] and Bc[i] is the number of elements in the two arrays that are no larger than B[i]. If we could construct these arrays efficiently, then we could find the kth smallest element efficiently by doing binary searches on both Ac and Bc to find the value k. The corresponding entry of A or B for that entry is then the kth largest element. The binary search is valid because the two arrays Ac and Bc are sorted, which I think you can convince yourself of pretty easily.

Of course, this solution doesn't work in sublinear time because it takes O(n) time to construct the arrays Ac and Bc. The question then is - is there some way that we can implicitly construct these arrays? That is, can we determine the values in these arrays without necessarily constructing each element? I think that the answer is yes, using this algorithm. Let's begin by searching array A to see if it has the kth smallest value. We know for a fact that the kth smallest value can't appear in the array in array A after position k (assuming all the elements are distinct). So let's focus just on the the first k elements of array A. We'll do a binary search over these values as follows. Start at position k/2; this is the k/2th smallest element in array A. Now do a binary search in array B to find the largest value in B smaller than this value and look at its position in the array; this is the number of elements in B smaller than the current value. If we add up the position of the elements in A and B, we have the total number of elements in the two arrays smaller than the current element. If this is exactly k, we're done. If this is less than k, then we recurse in the upper half of the first k elements of A, and if this is greater than k we recurse in the lower half of the first elements of k, etc. Eventually, we'll either find that the kth largest element is in array A, in which case we're done. Otherwise, repeat this process on array B.

The runtime for this algorithm is as follows. The search of array A does a binary search over k elements, which takes O(lg k) iterations. Each iteration costs O(lg n), since we have to do a binary search in B. This means that the total time for this search is O(lg k lg n). The time to do this in array B is the same, so the net runtime for the algorithm is O(lg k lg n) = O(lg2 n) = o(n), which is sublinear.


This is quite similar answer to Kirk's.

Let Find( nth, A, B ) be function that returns nth number, and |A| + |B| >= n. This is simple pseudo code without checking if one of array is small, less than 3 elements. In case of small array one or 2 binary searches in larger array is enough to find needed element.

Find( nth, A, B )
  If A.last() <= B.first():
    return B[nth - A.size()]
  If B.last() <= A.first():
    return A[nth - B.size()]
  Let a and b indexes of middle elements of A and B
  Assume that A[a] <= B[b] (if not swap arrays)
  if nth <= a + b:
    return Find( nth, A, B.first_half(b) )
  return Find( nth - a, A.second_half(a), B )

It is log(|A|) + log(|B|), and because input arrays can be made to have n elements each it is log(n) complexity.


int[] a = new int[] { 11, 9, 7, 5, 3 };
int[] b = new int[] { 12, 10, 8, 6, 4 };
int n = 7;
int result = 0;
if (n > (a.Length + b.Length))
    throw new Exception("n is greater than a.Length + b.Length");
else if (n < (a.Length + b.Length) / 2)
{
    int ai = 0;
    int bi = 0;
    for (int i = n; i > 0; i--)
    {
        // find the highest from a or b
        if (ai < a.Length)
        {
            if (bi < b.Length)
            {
                if (a[ai] > b[bi])
                {
                    result = a[ai];
                    ai++;
                }
                else
                {
                    result = b[bi];
                    bi++;
                }
            }
            else
            {
                result = a[ai];
                ai++;
            }
        }
        else
        {
            if (bi < b.Length)
            {
                result = b[bi];
                bi++;
            }
            else
            {
                // error, n is greater than a.Length + b.Length
            }
        }
    }
}
else
{
    // go in reverse
    int ai = a.Length - 1;
    int bi = b.Length - 1;
    for (int i = a.Length + b.Length - n; i >= 0; i--)
    {
        // find the lowset from a or b
        if (ai >= 0)
        {
            if (bi >= 0)
            {
                if (a[ai] < b[bi])
                {
                    result = a[ai];
                    ai--;
                }
                else
                {
                    result = b[bi];
                    bi--;
                }
            }
            else
            {
                result = a[ai];
                ai--;
            }
        }
        else
        {
            if (bi >= 0)
            {
                result = b[bi];
                bi--;
            }
            else
            {
                // error, n is greater than a.Length + b.Length
            }
        }
    }
}
Console.WriteLine("{0} th highest = {1}", n, result);


Sublinear of what though? You can't have an algorithm that doesn't check at least n elements, even verifying a solution would require checking that many. But the size of the problem here should surely mean the size of the arrays, so an algorithm that only checks n elements is sublinear.

So I think there's no trick here, start with the list with the smaller starting element and advance until you either:

  1. Reach the nth element, and you're done.
  2. Find the next element is bigger than the next element in the other list, at which point you switch to the other list.
  3. Run out of elements and switch.
0

精彩评论

暂无评论...
验证码 换一张
取 消