I am intending to use the n-gram code from this article. The algorithm produces these tri-gram开发者_StackOverflow results:
t, th, the, he, e, q, qu, qui, uic, ick, ck, k, r, re, red, ed, d
for the text the quick red
However wikipedia, reckons it should be:
the qui k_r
he_ uic _re
e_q ick red
_qu ck_
(space indicated by ‘_’).
What is correct? Are there any other C# implementation out there?
The second example is correct.
ps. Why do you generate trigrams for the complete text and not only for words? What is your use case?
The first is correct. I uses character N-gram on my thesis. You must move forward and pass one character for each step. In this condition, similar words can be found.
精彩评论