I have a list of strings that I need to check against an English dictionary. However I don't want to start checking every piece of gibberish in the list. First, I want to check if the string could be an English word.
Does anyone know of an algorithm that does this or at least the rules that I need to apply to verify a word?
For example:
No spoken word can star开发者_Python百科t with more than 3 consonants, and if there are are 3 initial consonants in a word, the first one must be "s".
Finding a word in a data structure is going to be fast (e.g. use a Bloom filter (mind the false positives!), or a set) so chances are it is not worth doing this for efficiency reasons.
If you want to provide suggestions, then look at Peter Norvig's spell checking implementation.
If you really want to go that way, then I'd construct frequencies of A follows B from existing text to see whether any given sequence is contained within English words.
This sort of task is what computers are for. Use some kind of set structure (maybe a bloom filter) to store all the words in a dictionary, and just check your word against it. That's a constant time operation.
精彩评论