During a recent job interview, I was asked to give a solution to the following problem:
Given a string s
(without spaces) and a dictionary, return the words in the dictionary that compose the string.
For example, s= peachpie, dic= {peach, pie}, result={peach, pie}
.
I will ask the the decision variation of this problem:
if
s
can be composed of words in the dictionary returnyes
, otherwise returnno
.
My solution to this was in backtracking (written in Java)
public static boolean words(String s, Set<String> dictionary)
{
if ("".equals(s))
return true;
for (int i=0; i <= s.length(); i++)
{
String pre = prefix(s,i); // returns s[0..i-1]
String suf = suffix(s,i); // returns s[i..s.len]
if (dictionary.contains(pre) && words(suf, dictionary))
return true;
}
return false;
}
public static void main(String[] args) {
Set<String> dic = new HashSet<String>();
dic.add("peach");
dic.add("pie");
dic.add("1");
System.out.println(words("peachpie1", dic)); // true
System.out.println(words("peachpie2", dic)); // false
}
What is the time complexity of this solution? I'm calling recursively in the for loop, but only for the prefix's that are in the dictionar开发者_运维问答y.
Any idea's?
You can easily create a case where program takes at least exponential time to complete. Let's just take a word aaa...aaab
, where a
is repeated n
times. Dictionary will contain only two words, a
and aa
.
b
in the end ensure that function never finds a match and thus never exits prematurely.
On each words
execution, two recursive calls will be spawned: with suffix(s, 1)
and suffix(s, 2)
. Execution time, therefore, grows like fibonacci numbers: t(n) = t(n - 1) + t(n - 2)
. (You can verify it by inserting a counter.) So, complexity is certainly not polynomial. (and this is not even the worst possible input)
But you can easily improve your solution with Memoization. Notice, that output of function words
depends on one thing only: at which position in original string we're starting. E.e., if we have a string abcdefg
and words(5)
is called, it doesn't matter how exactly abcde
is composed (as ab+c+de
or a+b+c+d+e
or something else). Thus, we don't have to recalculate words("fg")
each time.
In the primitive version, this can be done like this
public static boolean words(String s, Set<String> dictionary) {
if (processed.contains(s)) {
// we've already processed string 's' with no luck
return false;
}
// your normal computations
// ...
// if no match found, add 's' to the list of checked inputs
processed.add(s);
return false;
}
PS Still, I do encourage you to change words(String)
to words(int)
. This way you'll be able to store results in array and even transform the whole algorithm to DP (which would make it much simpler).
edit 2
Since I have not much to do besides work, here's the DP (dynamic programming) solution. Same idea as above.
String s = "peachpie1";
int n = s.length();
boolean[] a = new boolean[n + 1];
// a[i] tells whether s[i..n-1] can be composed from words in the dictionary
a[n] = true; // always can compose empty string
for (int start = n - 1; start >= 0; --start) {
for (String word : dictionary) {
if (start + word.length() <= n && a[start + word.length()]) {
// check if 'word' is a prefix of s[start..n-1]
String test = s.substring(start, start + word.length());
if (test.equals(word)) {
a[start] = true;
break;
}
}
}
}
System.out.println(a[0]);
Here's a dynamic programming solution that counts the total number of ways to decompose the string into words. It solves your original problem, since the string is decomposable if the number of decompositions is positive.
def count_decompositions(dictionary, word):
n = len(word)
results = [1] + [0] * n
for i in xrange(1, n + 1):
for j in xrange(i):
if word[n - i:n - j] in dictionary:
results[i] += results[j]
return results[n]
Storage O(n), and running time O(n^2).
The loop on all the string will take n
. Finding all suffixes and prefixes will take n + (n - 1) + (n - 2) + .... + 1
(n
for first call of words
, (n - 1)
for second and so on), which is
n^2 - SUM(1..n) = n^2 - (n^2 + n)/2 = n^2 / 2 - n / 2
which in complexity theory is equivalent to n^2.
Checking for existence in HashSet in normal case is Theta(1), but in worst case it is O(n).
So, normal case complexity of your algorithm is Theta(n^2), and worst case - O(n^3).
EDIT: I confused order of recursion and iteration, so this answer is wrong. Actually time depends on n
exponentially (compare with computation of Fibonacci numbers, for example).
More interesting thing is the question how to improve your algorithm. Traditionally for string operations suffix tree is used. You can build suffix tree with your string and mark all the nodes as "untracked" at the start of the algo. Then go through the strings in a set and each time some node is used, mark it as "tracked". If all strings in the set are found in the tree, it will mean, that the original string contains all the substrings from set. And if all the nodes are marked as tracked, it will mean, that string consists only of substring from set.
Actual complexity of this approach depends on many factors like tree building algorithm, but at least it allows to divide the problem into several independent subtasks and so measure final complexity by complexity of the most expensive subtask.
精彩评论