Determine if A+B=C exists in an array of n integers_问答_开发者

This is a problem a friend of mine received as his homework (in algorithm and data structure class). He asked me about this. However, I can't solve it and have been thinking about it for some time during the last several days.

There are n random integers in the range [0, 2³¹-1] (there may be duplicates. Determine if 3 numbers of these numbers satisfy A + B = C.

I first came up with a naive algorithm that's O(n²l开发者_StackOverflow社区og n). I then came up with an algorithm that's O(n²). Here is the pseudo code:

sort(a); // non-descending
for (i = 0; i < n; i++) {
  j = i; k = i + 1;
  while (j < n && k < n) {
    if (a[i] + a[j] == a[k])
      return true;
    else if (a[i] + a[k] < a[j])
      k++;
    else
      j++;
  }
}
return false;

However, the problem states that 1 < n <= 10⁶. I believe O(n²) is too slow. My solution doesn't make use of the randomness. However, I'm not sure if this is an important part of the problem.

The general problem is 3SUM-Hard and the question of whether there is a better than quadratic algorithm is open.

So if you need a faster algorithm, you would probably need to make use of the fact that they are 32-bit.

If numbers are random, any worst-case O(n^2) algorithm (including yours) will run very fast. In fact, the practical complexity will be O(n*logn) (the complexity of sorting).
It's much like quicksort, where we have O(n*logn) average and a tiny chance of hitting O(n^2).

10^6 random numbers give us ~ 10^6*10^6 'nearly random' sums in range ~ 0..10^9. What's the chance that one of those 10^12 random sums will be equal to a given random value in integer range? Pretty good.
Now, what's the chance that one of those 10^12 random sums will be equal to a one of 10^6 given random values? 100%, speaking poetically.

I've implemented your proposed solution, for n = 10^6 it performs on average 5000-10000 operations in the innermost loop. So much for O(n^2). Sorting is the costliest operation in there.

PS. You may reduce complexity farther and make it even O(1), if you update the solution to use hash instead of sorting.

PS 2. The test program in java, for the reference. Run it and see for yourself.

    int n = 1000000;
    int[] a = new int[n];

    // generate random array
    Random r = new Random();
    for (int i = 0; i < n; ++i) {
        do {
            a[i] = r.nextInt();
        } while (a[i] < 0);
    }

    Arrays.sort(a);

    // number of operations inside main loop
    int ops = 0;

    // main logic, pretty much as OP described it
    boolean found = false;
    for (int i = 0; i < n && !found; ++i) {
        int j = i;
        int k = i + 1;
        while (k < n) {
            ++ops;

            if (a[i] > a[k] - a[j]) {
                ++k;
            } else if (a[i] < a[k] - a[j]) {
                ++j;
            } else {
                System.out.println(a[i] + " + " + a[j] + " = " + a[k]);
                found = true;
                break;
            }
        }
    }

    System.out.println(ops);

Algorithm that uses hashing takes 10-900 microseconds in Python (average: 200 median: 60):

#!/usr/bin/env python
import random

L = frozenset(random.sample(xrange(2**31), 10**6))
print next(((a,b,a+b) for a in L for b in L if (a + b) in L), None)

It is O(N**2) but it seems it is fast enough.

For comparison, the amortized O(N) operation of creating the frozenset takes 270 milliseconds (1000 times slower then the search) and to create random list it takes 0.9 seconds.

Note: random.sample doesn't return repeated elements if an input sequence contains unique elements therefore frozenset doesn't discard any elements in the above example. To solve the problem for a random sequence that allows repeated elements we should use two data structures:

#!/usr/bin/env python
import random

L = [random.randrange(2**31) for _ in xrange(10**6)]
S = frozenset(L)
print len(L), len(S)
print next(((a, b, a+b) for a in L for b in L if (a + b) in S), None)

Output

1000000 999762
(2055933464, 83277289, 2139210753)

I'm getting O(n log n) when measuring this over sorted lists:

from bisect import bisect_right
import cProfile as prof
import random

def find3sum(T):
    if len(T) < 3:
        return None
    n = len(T)
    top = T[-1]
    for i in range(len(T)-1):
        b = top - T[i]
        if b < T[i]:
            return None
        k = bisect_right(T, b, i, n-1)
        while k > i:
            c = T[i] + T[k]
            j = bisect_right(T, c, k, n-1)
            if j <= k:
                break
            elif T[j] == c:
               return (i, k, j)
            else:
               k -= 1

def test_one(a):
    a = sorted(a)
    r = find3sum(a)
    i, k , j = r
    assert a[i] + a[k] == a[j]

def test():
    n = 100000
    max = 200000
    random.seed(0)
    for _ in range(100):
        a = [random.randint(0,max) for _x in xrange(n)]
        test_one(a)
        a = range(n)
        test_one(a)

prof.run('test()')

These are the results (about one call to bisect per element):

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.002    0.002  183.764  183.764 <string>:1(<module>)
      200    0.005    0.000   89.996    0.450 find2sum.py:25(test_one)
        1   17.269   17.269  183.762  183.762 find2sum.py:31(test)
      200   35.096    0.175   79.601    0.398 find2sum.py:5(find3sum)
 10000000   44.958    0.000   52.398    0.000 random.py:160(randrange)
 10000000   23.891    0.000   76.289    0.000 random.py:224(randint)
        1    0.000    0.000    0.000    0.000 random.py:99(seed)
 19599982   44.077    0.000   44.077    0.000 {_bisect.bisect_right}
        1    0.000    0.000    0.000    0.000 {function seed at 0x9a1972c}
      600    0.001    0.000    0.001    0.000 {len}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
 10000000    7.440    0.000    7.440    0.000 {method 'random' of '_random.Random' objects}
      301    0.635    0.002    0.635    0.002 {range}
      200   10.390    0.052   10.390    0.052 {sorted}

There are several optimizations that can considerably reduce the running time (like skipping over runs of numbers equal to the one already tested).

A+B=C, hence B=C-A or A=C-B

The above problem can be done in O(n) complexity by using hash table.

var C; // the sum you are looking for
for(each element)
    X = C - element
    boolean exists = lookup for X in hash table
    if (exists) combination A+B=C exists in the given input
    else hashtable.put(element)

Hope that helps.