开发者

Matching sentences with regex in Java

开发者 https://www.devze.com 2022-12-26 20:23 出处:网络
I开发者_运维问答\'m using the Scanner class in java to go through a a text file and extract each sentence. I\'m using the setDelimiter method on my Scanner to the regex:

I开发者_运维问答'm using the Scanner class in java to go through a a text file and extract each sentence. I'm using the setDelimiter method on my Scanner to the regex:

Pattern.compile("[\\w]*[\\.|?|!][\\s]")

This currently seems to work, but it leaves the whitespace at the end of the sentence. Is there an easy way to match the whitespace at the end but not include it in the result?

I realize this is probably an easy question but I've never used regex before so go easy :)


Try this:

"(?<=[.!?])\\s+"

This uses lookarounds to match \\s+ preceded by [.!?].


If you want to remove the punctuations as well, then just include it as part of the match:

"[.!?]+\\s+"

This will split "ORLY!?!? LOL" into "ORLY" and "LOL"


What you're looking for is a positive lookahead. This should do it:

Pattern.compile("\\w*[.?!](?=\\s)")
0

精彩评论

暂无评论...
验证码 换一张
取 消