开发者

An algorithm to sort a list of values into n groups so that the sum of each group is as close as possible

开发者 https://www.devze.com 2023-02-15 23:55 出处:网络
Basically I have a number of values that I need to split into n different groups so that the sums of each group are as close as possible to the sums of the others? The list of values isn\'t terribly l

Basically I have a number of values that I need to split into n different groups so that the sums of each group are as close as possible to the sums of the others? The list of values isn't terribly long so I could potentially just brute force it but I was wond开发者_如何转开发ering if anyone knows of a more efficient method of doing this. Thanks.


If an approximate solution is enough, then sort the numbers descendingly, loop over them and assign each number to the group with the smallest sum.

groups = [list() for i in range(NUM_GROUPS)]
for x in sorted(numbers, reverse=True):
    mingroup = groups[0]
    for g in groups:
        if sum(g) < sum(mingroup):
           mingroup = g
    mingroup.append(x)


This problem is called "multiway partition problem" and indeed is computationally hard. Googling for it returned an interesting paper "Multi-Way Number Paritioning where the author mentions the heuristic suggested by larsmans and proposes some more advanced algorithms. If the above heuristic is not enough, you may have a look at the paper or maybe contact the author, he seems to be doing research in that area.


Brute force might not work out as well as you think...

Presume you have 100 variables and 20 groups:

  • You can put 1 variable in 20 different groups, which makes 20 combinations.
  • You can put 2 variables in 20 different groups each, which makes 20 * 20 = 20^2 = 400 combinations.
  • You can put 3 variables in 20 different groups each, which makes 20 * 20 * 20 = 20^3 = 8000 combinations.
  • ...
  • You can put 100 variables in 20 different groups each, which makes 20^100 combinations, which is more than the minimum number of atoms in the known universe (10^80).

OK, you can do that a bit smarter (it doesn't matter where you put the first variable, ...) to get to something like Branch and Bound, but that will still scale horribly.

So either use a fast deterministic algorithm, like larsman proposes. Or if you need a more optimal solution and have the time to implement it, take a look at metaheuristic algorithms and software that implement them (such as Drools Planner).


You can sum the numbers and divide by the number of groups. This gives you the target value for the sums. Sort the numbers and then try to get subsets to add up to the required sum. Start with the largest values possible, as they will cause the most variability in the sums. Once you decide on a group that is not the optimal sum (but close), you could recompute the expected sum of the remaining numbers (over n-1 groups) to minimize the RMS deviation from optimal for the remaining groups (if that's a metric you care about). Combining this "expected sum" concept with larsmans answer, you should have enough information to arrive at a fast approximate answer. Nothing optimal about it, but far better than random and with a nicely bounded run time.


Do you know how many groups you need to split it into ahead of time?

Do you have some limit to the maximum size of a group?

A few algorithms for variations of this problem:

  • Knuth's word-wrap algorithm

  • algorithms minimizing the number of floppies needed to store a set of files, but keeping any one file immediately readable from the disk it is stored on (rather than piecing it together from fragments stored on 2 disks) -- I hear that "copy to floppy with best fit" was popular.

  • Calculating a cutting list with the least amount of off cut waste. Calculating a cutting list with the least amount of off cut waste

  • What is a good algorithm for compacting records in a blocked file? What is a good algorithm for compacting records in a blocked file?

  • Given N processors, how do I schedule a bunch of subtasks such that the entire job is complete in minimum time? multiprocessor scheduling.

0

精彩评论

暂无评论...
验证码 换一张
取 消