开发者

Java literate text word parsing regexp

开发者 https://www.devze.com 2023-03-01 07:00 出处:网络
Firstly I was happy with[A-Za-z]+ Now I need to parse words tha开发者_StackOverflowt end with the letter \"s\", but i should skip words that have 2 or more first letters in upper-case.

Firstly I was happy with [A-Za-z]+ Now I need to parse words tha开发者_StackOverflowt end with the letter "s", but i should skip words that have 2 or more first letters in upper-case.

I try something like [\n\\ ][A-Za-z]{0,1}[a-z]*s[ \\.\\,\\?\\!\\:]+ but the first part of it [\n\\ ] for some reason doesn't see the beginning of the line.

here is the example

the text is Denis goeS to school every day!

but the only parsed word is goeS

Any Ideas?


What about

\b[A-Z]?[a-z]*x\b

the \b is a word boundary, I assume that what you wanted. the ? is the shorter form of {0,1}


Try this:

Pattern p = Pattern.compile("\\b([A-Z]?[a-z]*[sS])\\b");
Matcher m = p.matcher("Denis goeS to school every day!");
while(m.find())
{
  System.out.println( m.group(1) );
}

The regex matches every word that starts with anything but a whitespace or 2 upper case characters, only contains lower case characters in the middle and ends on either s or S.

In your example this would match Denis and goeS. If you want to only match upper case S change the expression to "\\b([A-Z]?[a-z]*[S])\\b" which woudl match goeS and GoeS but not GOeS, gOeSor goES.

0

精彩评论

暂无评论...
验证码 换一张
取 消