开发者

Simple Java regex not working

开发者 https://www.devze.com 2023-01-22 04:09 出处:网络
I have this regex which is supposed to remove sentence delimiters(. and ?): sentence = sentence.replaceAll(\"\\\\.|\\\\?$\",\"\");

I have this regex which is supposed to remove sentence delimiters(. and ?):

sentence = sentence.replaceAll("\\.|\\?$","");

It works fine it converts

"I am Java developer." to "I am Java developer"

"Am I a Java developer?" to "Am I a Java developer"

But after deployment we fou开发者_运维问答nd that it also replaces any other dots in the sentence as

"Hi.Am I a Java developer?" becomes "HiAm I a Java developer"

Why is this happening?


The pipe (|) has the lowest precedence of all operators. So your regex:

\\.|\\?$

is being treated as:

(\\.)|(\\?$)

which matches a . anywhere in the string and matches a ? at the end of the string.

To fix this you need to group the . and ? together as:

(?:\\.|\\?)$

You could also use:

[.?]$

Within a character class . and ? are treated literally so you need not escape them.


What you're saying with "\\.|\\?$" is "either a period" or "a question mark as the last character".

I would recommend "[.?]$" instead in order to avoid the confusing escaping (and undesirable result, of course).


Your problem is because of the low precedence of the alternation operator |. Your regular expression means match one of:

  • . anywhere or
  • ? at the end of a line.

Use a character class instead:

"[.?]$"


You have forgotten to embrace the sentence-ending characters with round brackets:

sentence = sentence.replaceAll("(\\.|\\?)$","");

The better approach is to use [.?]$ like @Mark Byers suggested.

sentence = sentence.replaceAll("[.?]$","");
0

精彩评论

暂无评论...
验证码 换一张
取 消