I have the following code:
public static void main(String[] args){
StringBuilder content = new StringBuilder("abcd efg h i. - – jk(lmn) qq zz.");
String patternSource = "[.-–]($| )";
Pattern pattern = Pattern.compile(patternSource);
Matcher matcher = pattern.matcher(content);
System.out.println(matcher.replaceAll(""));
}
where patternSource character class consist of dot, minus sign and \u2013 character (something like long dash). Upon execution in gives me
abcefi- jk(lmn) qzz
If I change the order of sym开发者_如何学运维bols in my character class in any way, it begans to work normally, and gives
abcd efg h i jk(lmn) qq zz
What the hell?
Tested under JDK/JRE 1.6.0_23
If you have an unescaped hyphen in a character class it has a special meaning as a range of characters: e.g. [A-Z] means all the characters between A and Z.
An exception to this is when the hyphen is at the start or end of the character class, in which case it is treated literally and matches only a hyphen.
精彩评论