Algorithm to "transfer water from a set of bottles to another one" (metaphorically speaking)_问答_开发者

Ok, I have a problem. I have a set "A" of bottles of various sizes, all full of water. Then I have another set "B" of bottles of various sizes, all empty.

I want to transfer the water from A to B, knowing that the total capacity of each set is the same. (i.e.: Set A contains the same amount of water as set B).

This is of course trivial in itself, just take the first bottle in B, pour it in the first in A until this is full. Then if the bottle from B has still water in it, go on with the second bottle in A, etc.

However, I want to minimize the total number of pours (the action of pouring from a bottle into another, each action counts 1, independently from how much water it involves)

I'd like to find a greedy algorithm to do this, o开发者_JAVA技巧r if not possible at least an efficient one. However, efficiency is secondary to correctness of the algorithm (I don't want a suboptimal solution).

Of course this problem is just a metaphor for a real problem in a computer program to manage personal expenses.

Bad news: this problem is NP-hard by a reduction from subset sum. Given numbers x₁, …, x_n, S, the object of subset sum is to determine whether or not some subset of the x_is sum to S. We make A-bottles with capacities x₁, …, x_n and B-bottles with capacities S and (x₁ + … + x_n - S) and determine whether n pours are sufficient.

Good news: any greedy strategy (i.e., choose any nonempty A, choose any unfilled B, pour until we have to stop) is a 2-approximation (i.e., uses at most twice as many pours as optimal). The optimal solution uses at least max(|A|, |B|) pours, and greedy uses at most |A| + |B|, since every time greedy does a pour, either an A is drained or a B is filled and does not need to be poured out of or into again.

~~There might be an approximation scheme (a (1 + ε)-approximation for any ε > 0).~~ I think now it's more likely that there's an inapproximability result – the usual tricks for obtaining approximation schemes don't seem to apply here.

Here are some ideas that might lead to a practical exact algorithm.

Given a solution, draw a bipartite graph with left vertices A and right vertices B and an (undirected) edge from a to b if and only if a is poured into b. If the solution is optimal, I claim that there are no cycles – otherwise we could eliminate the smallest pour in the cycle and replace the lost volume going around the cycle. For example, if I have pours

a1 -> b1: 1
a1 -> b2: 2
a2 -> b1: 3
a2 -> b3: 4
a3 -> b2: 5
a3 -> b3: 6

then I can eliminate by a1 -> b1 pour like so:

a2 -> b1: 4 (+1)
a2 -> b3: 3 (-1)
a3 -> b3: 7 (+1)
a3 -> b2: 4 (-1)
a1 -> b2: 3 (+1)

Now, since the graph has no cycle, we can count the number of edges (pours) as |A| + |B| - #(connected components). The only variable here is the number of connected components, which we want to maximize.

I claim that the greedy algorithm forms graphs that have no cycle. If we knew what the connected components of an optimal solution were, we could use a greedy algorithm on each one and get an optimal solution.

One way to tackle this subproblem would be to use dynamic programming to enumerate all subset pairs X of A and Y of B such that sum(X) == sum(Y) and then feed these into an exact cover algorithm. Both steps are of course exponential, but they might work well on real data.

Here's my take:

Identify bottles having the exact same size in both sets. This translate to one-to-one pour for these same-size bottles.
Sort the remaining bottles in A in descending order by capacity, and sort remaining bottles in B in ascending order. Compute the number of pours you need when pouring sorted list in A to B.

Update: After each pour in step 2, repeat step 1. (Optimization step suggested by Steve Jessop). Rinse and repeat until all water is transferred.

i think this gives the minimum number of pours:

import bisect

def pours(A, B):
    assert sum(A) == sum(B)
    count = 0
    A.sort()
    B.sort()
    while A and B:
        i = A.pop()
        j = B.pop()
        if i == j:
            count += 1
        elif i > j:
            bisect.insort(A, i-j)
            count += 1
        elif i < j:
            bisect.insort(B, j-i)
            count += 1
    return count

A=[5,4]
B=[4,4,1]
print pours(A,B)
# gives 3

A=[5,3,2,1] 
B=[4,3,2,1,1]
print pours(A,B)
# gives 5

in English it reads:

assert that both lists have the same sum (i think the algorithm will still work if sum(A) > sum(B) or sum(A) < sum(B) is true)
take the two lists A and B, sort both them

while A isn't empty and B isn't empty:

take i (the largest) from A and j (the largest) from B
if i equals j, pour i in j and count 1 pour
if i is larger than j, pour i in j, place i-j remainder back in A (using an insertion sort), count 1 pour
if i is smaller than j, pour i in j, place j-i remainder back in B (using an insertion sort), count 1 pour