For a project that I am doing I have to read a String. This String may contain one or more hexadecimal representations of unicode characters (e.g. "\u0161" for "š"). I want to convert these codes to the correct character representation.
To do this, I first need to detect that there is an hexadecimal sequence of th开发者_开发问答e format "\uAAAA" in my String, and therefor I wrote the following regular expression:
Pattern classPattern = Pattern.compile("\\u[0-9a-fA-F]{4}");
Matcher classMatcher = classPattern.matcher("\\u1ECD");
System.out.println(classMatcher.find());
Unfortunately this generates a " java.util.regex.PatternSyntaxException: Illegal Unicode escape sequence near index 2" error.
However, if I replace the "\", just for testing purposes by an "@" the regex works as expected:
Pattern classPattern = Pattern.compile("@u[0-9a-fA-F]{4}");
Matcher classMatcher = classPattern.matcher("@u1ECD");
System.out.println(classMatcher.find());
This leads me to believe that I am doing something wrong with the back slash. I tried also many other sequences, but none of them worked. Please help.
The '\u' has a special meaning in the Java regex world, namely around matching actual unicode characters. You have to double escape the \
to get the match you desire.
Pattern classPattern = Pattern.compile("\\\\u[0-9a-fA-F]{4}");
[update] As comments have pointed out, my reasoning for giving the right answer was flawed.
精彩评论