I开发者_运维问答'm using the Scanner class in java to go through a a text file and extract each sentence. I'm using the setDelimiter method on my Scanner to the regex:
Pattern.compile("[\\w]*[\\.|?|!][\\s]")
This currently seems to work, but it leaves the whitespace at the end of the sentence. Is there an easy way to match the whitespace at the end but not include it in the result?
I realize this is probably an easy question but I've never used regex before so go easy :)
Try this:
"(?<=[.!?])\\s+"
This uses lookarounds to match \\s+
preceded by [.!?]
.
If you want to remove the punctuations as well, then just include it as part of the match:
"[.!?]+\\s+"
This will split "ORLY!?!? LOL"
into "ORLY"
and "LOL"
What you're looking for is a positive lookahead. This should do it:
Pattern.compile("\\w*[.?!](?=\\s)")
精彩评论