I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.
x3ca hrefx3dx22http:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx22x3ehttp:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx3c\/ax3e
I have tried java.net.UrlDecoder.d开发者_开发技巧ecode without anyluck.
The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:
public static String convertUTF8Units(String input) {
String part = "", output = input;
for(int i=0;i<=input.length()-4;i++) {
part = input.substring(i, i+4);
if(part.startsWith("\\x")) {
byte[] rawByte = new byte[1];
rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
String raw = new String(rawByte);
output = output.replace(part, raw);
}
}
return output;
}
I know, its a bit frowzy, but it works :)
That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]
) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2})
, by getting the integer represented by the two hex numbers, and then casting it to a char
(or something similar to that).
Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.
Thanks!!
Take care, in the for the operator must be "<=" else one character can't be decoded.
for(int i=0;i<=input.length()-4;i++) {..}
Cheers!
This works for me
public static String convertUTF8Units_version2(String input) throws UnsupportedEncodingException
{
return URLDecoder.decode(input.replaceAll("\\\\x", "%"),"UTF-8");
}
精彩评论