开发者

Algorithm for partitioning set of sets into approximately equal-sized chunks?

开发者 https://www.devze.com 2023-02-15 12:01 出处:网络
Consider a set A of n finite sets, whose members are not necessarily disjoint. Let P={P[1], P[2], ..., 开发者_如何转开发P[m]} be a partition of A, and for each i in 1..m, let U[i] be the union of all

Consider a set A of n finite sets, whose members are not necessarily disjoint. Let P={P[1], P[2], ..., 开发者_如何转开发P[m]} be a partition of A, and for each i in 1..m, let U[i] be the union of all of the elements of P[i]. So U={U[1], U[2], ..., U[m]}. I would like an algorithm to a find a P such that the corresponding U is a partition, and such that the difference in cardinality (i.e. size) between the smallest and largest elements of U is minimised.

Characteristics of the data:

  • m is small (2 to 5) and n<10000
  • Typically, there is a large proportion of 1-element sets in A
  • Intersections between pairs of sets in A are typically small or empty


My necklace analogy in the question comments suggests this solution:

  1. Build an undirected graph G whose vertices are the elements of A, and where there is an edge from A[i] to A[j] iff A[i] intersects A[j].
  2. Find the connected components C of G. This can be done with a simple breadth-first or depth-first algorithm.
  3. For each C[i], take the vertices of C[i] and union them together, yielding D[i]. You now have reduced the problem to a special case, because the set D is a partition of the union of the elements of A.
  4. Use a bin-packing algorithm to try and fit the elements of D into precisely m bins, each of size ceil(t/m) where t is the size of the union of all the elements of D. If that fails, increase the sizes of the bins repeatedly until either it succeeds or it's clear that it's never going to succeed. Bin-packing algorithms are typically heuristic, so a perfect solution might not be found. Also, this is more than a simple bin-packing problem, so even a perfect bin-packing algorithm might not find the optimal solution.

I'd be interested to know if there is a more efficient solution. In particular, I have a hunch that the repeated use of the bin-packing algorithm in step 4 is not sensible.


I'd start with the intersection graph of A, which has nodes elements of A and edges when the two nodes have a non-empty intersection. Each connected component of this graph has to be contained in a single P(i) for some i.

let C(1),...,C(k) be the connected components of the graph. Let

size(j)=|union(a in C(j))|

Now you can rewrite the problem in terms of the size(i) values, with i=1...k. Namely, given positive integer values s(1),..,s(k). For a subset P of [1,..k] we define s(P)=sum(j in P) s(j).

We want to find a partition P'=(P'(1),...,P'(m)) of [1,..,k] with the condition that it minimizes the value:

max s(P'(j)) - min s(P'(j))

Therefore, we really need to know not the likely sizes of the elements of A, but the likely sizes of the connected components of the graph to come up with a "best" algorithm.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号