开发者

Decoding html returned as json response - android

开发者 https://www.devze.com 2023-01-17 18:37 出处:网络
I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.

I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.

x3ca hrefx3dx22http:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx22x3ehttp:\/\/wordnetweb.princeton.edu\/perl\/webwn?sx3dstrandx3c\/ax3e

I have tried java.net.UrlDecoder.d开发者_开发技巧ecode without anyluck.


The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:

public static String convertUTF8Units(String input) {
    String part = "", output = input;
    for(int i=0;i<=input.length()-4;i++) {
        part = input.substring(i, i+4);
        if(part.startsWith("\\x")) {
            byte[] rawByte = new byte[1];
            rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
            String raw = new String(rawByte);
            output = output.replace(part, raw);
        }
    }

    return output;
}

I know, its a bit frowzy, but it works :)


That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2}), by getting the integer represented by the two hex numbers, and then casting it to a char (or something similar to that).

Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.


Thanks!!

Take care, in the for the operator must be "<=" else one character can't be decoded.

for(int i=0;i<=input.length()-4;i++) {..}

Cheers!


This works for me

    public static String convertUTF8Units_version2(String input) throws UnsupportedEncodingException
    {
         return URLDecoder.decode(input.replaceAll("\\\\x", "%"),"UTF-8");
    }
0

精彩评论

暂无评论...
验证码 换一张
取 消