I have this regex which is supposed to remove sentence delimiters(.
and ?
):
sentence = sentence.replaceAll("\\.|\\?$","");
It works fine it converts
"I am Java developer."
to "I am Java developer"
"Am I a Java developer?"
to "Am I a Java developer"
But after deployment we fou开发者_运维问答nd that it also replaces any other dots in the sentence as
"Hi.Am I a Java developer?"
becomes "HiAm I a Java developer"
Why is this happening?
The pipe (|
) has the lowest precedence of all operators. So your regex:
\\.|\\?$
is being treated as:
(\\.)|(\\?$)
which matches a .
anywhere in the string and matches a ?
at the end of the string.
To fix this you need to group the .
and ?
together as:
(?:\\.|\\?)$
You could also use:
[.?]$
Within a character class .
and ?
are treated literally so you need not escape them.
What you're saying with "\\.|\\?$"
is "either a period" or "a question mark as the last character".
I would recommend "[.?]$"
instead in order to avoid the confusing escaping (and undesirable result, of course).
Your problem is because of the low precedence of the alternation operator |
. Your regular expression means match one of:
.
anywhere or?
at the end of a line.
Use a character class instead:
"[.?]$"
You have forgotten to embrace the sentence-ending characters with round brackets:
sentence = sentence.replaceAll("(\\.|\\?)$","");
The better approach is to use [.?]$
like @Mark Byers suggested.
sentence = sentence.replaceAll("[.?]$","");
精彩评论