I would like to know that if I have an english dictionary in a text file what is the best way to check whether a given string is a proper and correct english word. My dictionary contains about 100000 english words and I have to check on an average of 60000 words in one go. I am just looking for the most efficient way. Also should I store all the strings first or I just process them as they are 开发者_运维知识库generated.
Thanx
100k is not too great a number, so you can just pop everything in a Hashset<string>
.
Hashset lookup is key-based, so it will be lightning fast.
example how this might look in code is:
string[] lines = File.ReadAllLines(@"C:\MyDictionary.txt");
HashSet<string> myDictionary = new HashSet<string>();
foreach (string line in lines)
{
myDictionary.Add(line);
}
string word = "aadvark";
if (myDictionary.Contains(word))
{
Console.WriteLine("There is an aadvark");
}
else
{
Console.WriteLine("The aadvark is a lie");
}
You should probably use HashSet<string>
if you're using .NET 3.5 or higher.
Just load the dictionary of valid words into a HashSet<string>
and then either use Contains
on each candidate string, or use some of the set operators to find all words which aren't valid.
For example:
// There are loads of ways of loading words from a file, of course
var valid = new HashSet<string>(File.ReadAllLines("dictionary.txt"));
var candidates = new HashSet<string>(File.ReadAllLines("candidate.txt"));
var validCandidates = candidates.Intersect(valid);
var invalidCandidates = candidates.Except(valid);
You may also wish to use case-insensitive comparisons or something similar - use the StringComparer
static properties to get at appropriate instances of StringComparer
which you can pass to the HashSet
constructor.
If you're using .NET 2, you can use a Dictionary<string, whatever>
as a poor-man's set - basically use whatever you like as the value, and just check for keys.
精彩评论