I have been trying to wrap my head around this for a while now but have not been able to come up with a good solution. Here goes:
Given a number of sets:
set1: A, T
set2: C
set3: A, C, G
set4: T
set5: G
I want to generate all possible sequences from a list of sets. In this example the length of the sequence is 5, but it can be any length up to around 20. For position 1 the possible candidates are 'A' and 'T' respectively, for position 2 the only option is 'C' and so on.
The answer for the example above would be:
ACATG, ACCTG, ACGTG, TCATG, TCCTG, TCGTG
I am doing this in ruby and I have the different sets as arrays within a master array:
[[A, T], [C], [A, C, G], [T], [G]]
At first I thought a r开发者_如何学运维ecursive solution would be best but I was unable figure out how to set it up properly.
My second idea was to create another array of the same size with an index for each set. So 00000 would correspond to the first sequence above 'ACATG' and 10200 would correspond to 'TCGTG'. Beginning with 00000 I would increase the last index by one and modulo it with the length of the set in question (2 for set1, 1 for set2 above) and if the counter wrapped around I would zero it and increase the previous one by one.
But the more I thought about this solution it seemed too complex for this very small problem. There must be a more straight-forward solution that I am missing. Could anyone help me out?
/Nick
The Array class in Ruby 1.8.7 has an Array#product method, which returns the cartesian product of the arrays in question.
irb(main):001:0> ['A', 'T'].product(['C'], ['A', 'C', 'G'], ['T'], ['G'])
=> [["A", "C", "A", "T", "G"], ["A", "C", "C", "T", "G"], ["A", "C", "G", "T", "G"], ["T", "C", "A", "T", "G"], ["T", "C", "C", "T", "G"], ["T", "C", "G", "T", "G"]]
精彩评论