I'm attempting to write a basic dna sequencer. In that, given two sequences of the same length, it will output the strings which are the same, with a minimal length of 3. So input of
abcdef dfeabc
will return
1 abc
I am not sure how to go about solving the problem.
I can compare the two strings, and see if they are completely equal. From there, i can compare length-1 string size, i.e. if dfeabc
contains abcde
. However, how can i get the program to do all possible str开发者_StackOverflow中文版ings, down to a minimal size of 3 characters?
One issue is for the above example of length-1, I'd also have to do the string bcdef and compare that.
The naive way would be to get every single substring of string A and see if it's in string B.
Here's the naive way of doing that:
for ( int i = 0; i < a.length; i++ ) {
for ( int j = i+1; j <= a.length; j++ ) {
if (b.contains(a.substring(i, j))) {
//if longer than last found match, record this match
}
}
}
The slightly more optimal way would be to look at longer substrings first, so that the first substring that matches is necessarily the longest.
for ( int length = a.length; length > 0; length-- ) {
for ( int i = 0; i + length < a.length; i++ ) {
if ( b.contains(a.substring(i, i+length)) ) {
//this is the biggest match
}
}
}
If you wanted to solve this in a simple way, without using any Java API for searching, you could think of it like this: for every pair of possible starting indices i in the first string and j in the second string, step forward in both while the corresponding characters in the first and the second string are equal, and return a result if you did at least 3 steps.
You need to use the Longest Common Substring algorithm, a dynamic programming problem. Check Wikipedia's entry for an explanation of the algorithm and here for a sample implementation.
精彩评论