I have a list of about 5开发者_开发知识库0 keywords and about 50000 strings. I check every string if it contains at least one of the keywords. I'm not interested in the matched keyword or the number of matched keywords. I only want a "true" or "false" back, as fast as possible.
So, I bet there's an algorithm out there that outperforms my current LINQ version by far:
class MyEnumerableExtension
{
public static bool ContainsAny(this string searchString, IEnumerable<string> keywords)
{
return keywords.Any(keyword => searchString.Contains(keyword))
}
}
bool foundAny = "abcdef".ContainsAny(new string[] { "ac", "bd", "cd" } );
isn't this in essence the same as your other question of today Efficient algorithm for finding all keywords in a text except modified to return once a match has been found?
There are multiple algorithms to search a text for a set of substrings.
Yo can implement Knuth-Morris-Pratt algorithm.
A quick analysis shows that you are iteratively searching for the keywords. If you could search in one pass for all fo the keywords, you should have an overall improvement in your algorithm. A Regex expression can do that and couple it with the "Compiled" option and you should begin to see a performance increase (because it will single pass the string for all keywords). But, it would only benefit you if you have several keywords. Here's a quick idea to help you along, but note, I have not actually tested the performance against your algorithm.
string[] keywords = { "ac", "bd", "cd" };
string[] tosearch = { "abcdef" };
string pattern = String.Join("|", keywords);
Regex regex = new Regex(pattern, RegexOptions.Compiled);
foundAny = regex.IsMatch(String.Join("|", tosearch));
Also note, this works as long as your keywords do not contain any Regex special characters (and your search strings do not contain the pipe symbol. However, the special characters could be overcome with escape sequences, and the search strings do not have to be joined as I've done.
精彩评论