Minimum of sum of absolute values_问答_开发者_运维开发者技术经验分享

Problem statement:

There are 3 arrays A,B,C all filled with positive integers, a开发者_如何学Cnd all the three arrays are of the same size.

Find min(|a-b|+|b-c|+|c-a|) where a is in A, b is in B, c is in C.

I worked on the problem the whole weekend. A friend told me that it can be done in linear time. I don't see how that could be possible.

How would you do it ?

Well, I think I can do it in O(n log n). I can only do O(n) if the arrays are initially sorted.

First, observe that you can permute a,b,c however you like without changing the value of the expression. So let x be the smallest of a,b,c; let y be the middle of the three; and let z be the maximum. Then note that the expression just equals 2*(z-x). (Edit: This is easy to see... Once you have the three numbers in order, x < y < z, the sum is just (y-x) + (z-y) + (z-x) which equals 2*(z-x))

Thus, all we are really trying to do is find three numbers such that the outer two are as close together as possible, with the other number "sandwiched" between them.

So start by sorting all three arrays in O(n log n). Maintain an index into each array; call these i, j, and k. Initialize all three to zero. Whichever index points to the smallest value, increment that index. That is, if A[i] is smaller than B[j] and C[k], increment i; if B[j] is smallest, increment j; if C[k] is smallest, increment k. Repeat, keeping track of |A[i]-B[j]| + |B[j]-C[k]| + |C[k]-A[i]| the whole time. The smallest value you observe during this march is your answer. (When the smallest of the three is at the end of its array, stop because you are done.)

At each step, you add one to exactly one index; but you can only do this n times for each array before hitting the end. So this is at most 3*n steps, which is O(n), which is less than O(n log n), meaning the total time is O(n log n). (Or just O(n) if you can assume the arrays are sorted.)

Sketch of a proof that this works: Suppose A[I], B[J], C[K] are the a, b, c that form the actual answer; i.e., they have the minimum |a-b|+|b-c|+|c-a|. Suppose further that a > b > c; the proof for the other cases is symmetric.

Lemma: During our march, we do not increment j past J until after we increment k past K. Proof: We always increment the index of the smallest element, and when k <= K, B[J] > C[k]. So when j=J and k <= K, B[j] is not the smallest element, so we do not increment j.

Now suppose we increment k past K before i reaches I. What do things look like just before we perform that increment? Well, C[k] is the smallest of the three at that moment, because we are about to increment k. A[i] is less than or equal to A[I], because i < I and A is sorted. Finally, j <= J because k <= K (by our Lemma), so B[j] is also less than A[I]. Taken together, this means our sum-of-abs-diff at this moment is less than 2*(c-a), which is a contradiction.

Thus, we do not increment k past K until i reaches I. Therefore, at some point during our march i=I and k=K. By our Lemma, at this point j is less than or equal to J. So at this point, either B[j] is less than the other two and j will get incremented; or B[j] is between the other two and our sum is just 2*(A[i]-C[k]), which is the right answer.

This proof is sloppy; in particular, it fails to explicitly account for the case where one or more of a,b,c are equal. But I think that detail can be worked out pretty easily.

I would write a really simple program like this:

#!/usr/bin/python
import sys, os, random
A = random.sample(range(100), 10)
B = random.sample(range(100), 10)
C = random.sample(range(100), 10)
minsum = sys.maxint
for a in A:
 for b in B:
  for c in C:
   print 'checking with a=%d b=%d c=%d' % (a, b, c)
   abcsum = abs(a - b) + abs(b - c) + abs(c - a)
   if abcsum < minsum:
    print 'found new low sum %d with a=%d b=%d c=%d' % (abcsum, a, b, c)
    minsum = abcsum

And test it over and over until I saw some pattern emerge. The pattern I found here is what would be expected: the numbers that are closest together in each set, regardless of whether the numbers are "high" or "low", are those that produce the smallest minimum sum. So it becomes a nearest-number problem. For whatever that's worth, probably not much.