开发者

Error in regex when first character is \ works fine with any other character

开发者 https://www.devze.com 2023-04-06 21:38 出处:网络
For a project that I am doing I have to read a String. This String may contain one or more hexadecimal representations of unicode characters (e.g. \"\\u0161\" for \"š\"). I want to convert these code

For a project that I am doing I have to read a String. This String may contain one or more hexadecimal representations of unicode characters (e.g. "\u0161" for "š"). I want to convert these codes to the correct character representation.

To do this, I first need to detect that there is an hexadecimal sequence of th开发者_开发问答e format "\uAAAA" in my String, and therefor I wrote the following regular expression:

Pattern classPattern = Pattern.compile("\\u[0-9a-fA-F]{4}");
Matcher classMatcher = classPattern.matcher("\\u1ECD");
System.out.println(classMatcher.find());

Unfortunately this generates a " java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 2" error.

However, if I replace the "\", just for testing purposes by an "@" the regex works as expected:

Pattern classPattern = Pattern.compile("@u[0-9a-fA-F]{4}");
Matcher classMatcher = classPattern.matcher("@u1ECD");
System.out.println(classMatcher.find());

This leads me to believe that I am doing something wrong with the back slash. I tried also many other sequences, but none of them worked. Please help.


The '\u' has a special meaning in the Java regex world, namely around matching actual unicode characters. You have to double escape the \ to get the match you desire.

Pattern classPattern = Pattern.compile("\\\\u[0-9a-fA-F]{4}");

[update] As comments have pointed out, my reasoning for giving the right answer was flawed.

0

精彩评论

暂无评论...
验证码 换一张
取 消