开发者

check inputted string from a file contains of allowable words

开发者 https://www.devze.com 2023-02-22 05:33 出处:网络
I\'m starting to write a program here to check the inputted word/s by user whether correct or not then the program will have the capability to correct it from point to point letter/s by letter/s. Able

I'm starting to write a program here to check the inputted word/s by user whether correct or not then the program will have the capability to correct it from point to point letter/s by letter/s. Able to move letter by this point to that point just to correct the word that depends on the list of words from a .txt file.

e.g. input:

"tihs is nto a corerct sentnece" (this is not a correct sentence)

If the user has inputted a wrong word/s the program will scan the .txt file then find the most near corr开发者_如何学运维ect word just to correct the wrong inputted word then the program has the capability to correct it and output the correct sentence like:

"this is not a correct sentence" from (tihs is nto a corerct sentnece)

Every incorrect word/s will be scanned based on the .txt file.

My question is, how am I going to start coding for this stuff? thanks...


From "How to write a spelling corrector" by Peter Norvig:

The full details of an industrial-strength spell corrector like Google's would be more confusing than enlightening, but I figured that on the plane flight home, in less than a page of code, I could write a toy spelling corrector that achieves 80 or 90% accuracy at a processing speed of at least 10 words per second.

Peter Norvig is a very talented computer scientist, and a great explainer, so I highly recommend his blog.


First thing, you obviously need to find words spelled incorrectly. Next, you should determine a way of choosing a value for words that are possibly correct. I.e. "folor" could be "floor" with jumbled letters or "color" with a 'f' as opposed to 'c' and so on. In this case, both words are really close: two mixed up letters and a character replacing another character close to it on the keyboard. You would have to assign each of these values based off what you think is a more common mistake. In general, you could put each word with a low value into a Priority Queue and then pull from there. However, if the only case is the one described (swapped letters) then it is a little easier in terms of your sample size, but you would still have to assign a value to each word.

Note: nto could also be fixed to ton. If you wish to get rid of this possibility, you would have to check grammar as well.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号