开发者

Population segmentation algorithm

开发者 https://www.devze.com 2023-03-26 00:48 出处:网络
I have a population of 50 ordered integers (1,2,3,..,50) and I look for a generic way to slice it \"n\" ways (\"n\" is the number of cutoff points ranging from 1 to 25) that maintains the order of the

I have a population of 50 ordered integers (1,2,3,..,50) and I look for a generic way to slice it "n" ways ("n" is the number of cutoff points ranging from 1 to 25) that maintains the order of the elements.

For example, for n=1 (one cutoff point) there are 49 possible grouping alternatives ([1,2-49], [1-2,3-50], [1-3,4-50],...). For n=2 (two cutoff points), the grouping alternatives are like: [1,2,3-50], [1,2-3,4-50],...

Could you recommend any general-purpose algorithm to complete this task in an efficient way?

Thanks, Chris


Thanks everyone for your feedback. I reviewed all your comments and I am working on a generic solution that will return all combinations (e.g., [1,2,3-50], [1,2-3,4-50],...) for all n开发者_StackOverflow社区umbers of cutoff points.

Thanks again, Chris


Let sequence length be N, and number of slices n.

That problem becomes easier when you notice that, choosing a slicing to n slices is equivalent to choosing n - 1 from N - 1 possible split points (a split point is between every two numbers in the sequence). Hence there is (N - 1 choose n - 1) such slicings.

To generate all slicings (to n slices), you have to generate all n - 1 element subsets of numbers from 1 to N - 1.

The exact algorithm for this problem is placed here: How to iteratively generate k elements subsets from a set of size n in java?


Do you need the cutoffs, or are you just counting them. If you're just going to count them, then it's simple:

1 cutoff = (n-1) options

2 cutoffs = (n-1)*(n-2)/2 options

3 cutoffs = (n-1)(n-2)(n-3)/4 options

you can see the patterns here

If you actually need the cutoffs, then you have to actually do the loops, but since n is so small, Emilio is right, just brute force it.

1 cutoff

for(i=1,i<n;++i)
  cout << i;

2 cutoffs

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    cout << i << " " << j;

3 cutoffs

for(i=1;<i<n;++i)
  for(j=i+1,j<n;++j)
    for(k=j+1,k<n;++k)
      cout << i << " " << j << " " << k;

again, you can see the pattern


So you want to select 25 split point from 49 choices in all possible ways. There are a lot of well known algorithms to do that.

I want to draw your attention to another side of this problem. There are 49!/(25!*(49-25)!) = 63 205 303 218 876 >= 2^45 ~= 10^13 different combinations. So if you want to store it, the required amount of memory is 32TB * sizeof(Combination). I guess that it will pass 1 PB mark.

Now lets assume that you want to process generated data on the fly. Lets make rather optimistic assumption that you can process 1 million combinations per second (here i assume that there is no parallelization). So this task will take 10^7 seconds = 2777 hours = 115 days.

This problem is more complicated than it seems at first glance. If you want to solve if at home in reasonable time, my suggestion is to change the strategy or wait for the advance of quantum computers.


This will generate an array of all the ranges, but I warn you, it'll take tons of memory, due to the large numbers of results (50 elements with 3 splits is 49*48*47=110544) I haven't even tried to compile it, so there's probably errors, but this is the general algorithm I'd use.

typedef std::vector<int>::iterator iterator_t;
typedef std::pair<iterator_t, iterator_t> range_t;
typedef std::vector<range_t> answer_t;
answer_t F(std::vector<int> integers, int slices) {
    answer_t prev;  //things to slice more
    answer_t results; //thin
    //initialize results for 0 slices
    results.push_back(answer(range(integers.begin(), integers.end()), 1));
    //while there's still more slicing to do
    while(slices--) {  
        //move "results" to the "things to slice" pile
        prev.clear(); 
        prev.swap(results); 
        //for each thing to slice
        for(int group=0; group<prev.size(); ++group) { 
            //for each range
            for(int crange=0; crange<prev[group].size(); ++crange) { 
                //for each place in that range
                for(int newsplit=0; newsplit<prev[group][crange].size(); ++newsplit) { 
                    //copy the "result"
                    answer_t cur = prev[group];
                    //slice it
                    range_t L = range(cur[crange].first, cur[crange].first+newsplit);
                    range_t R = range(cur[crange].first+newsplit), cur[crange].second);
                    answer_t::iterator loc = cur.erase(cur.begin()+crange);
                    cur.insert(loc, R);
                    cur.insert(loc, L);
                    //add it to the results
                    results.push_back(cur);
                }
            }
        }
    }
    return results;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消