I am trying to implement porter stemming algorithm, but I stumbled at this point
where the square brackets denote arbitrary presence of their contents. Using (VC){m} to denote VC repeated m times, this may again be written as
[C](VC){m}[V].
m will be called the \measure\ of any word or word part when represented in this form. The case m = 0 covers the null word. Here are some examples:
开发者_StackOverflow社区m=0 TR, EE, TREE, Y, BY. m=1 TROUBLE, OATS, TREES, IVY. m=2 TROUBLES, PRIVATE, OATEN, ORRERY.
I don't understand what is this "measure" and what does it stand for?
Looks like the measure is the number of times a vowel is immediately followed by a consonant. For example,
"TROUBLES" has:
Optional initial consonants [C]
= "TR".
First vowels-consonants group (VC)
= "OUBL".
Second vowels-consonants group (VC)
= "ES".
Optional ending vowels [V]
is empty.
So the measure is two, the number of times (VC)
was "matched".
精彩评论