开发者

decode encode between String and byte in java

开发者 https://www.devze.com 2022-12-26 14:20 出处:网络
byte[] bytes = new byte[] { 1, -1 }; System.out.println(Arrays.toString(new String(bytes, \"UTF-8\").getBytes(\"UTF-8\")));
byte[] bytes = new byte[] { 1, -1 };
System.out.println(Arrays.toString(new String(bytes, "UTF-8").getBytes("UTF-8")));
System.out.println(Arrays.t开发者_StackOverflowoString(new String(bytes, "ISO-8859-1").getBytes("ISO-8859-1")));

output:

[1, -17, -65, -67]
[1, -1]

why???


Your byte array isn't a valid UTF-8-encoded string... so the string you get from

new String(bytes, "UTF-8")

contains U+0001 (for the first byte) and U+FFFD to signify bad data in the second byte. When that string is encoded using UTF-8, you get the byte pattern shown.

Basically you shouldn't try to interpret arbitrary binary data as if it were encoded in a particular encoding. If you want to represent arbitrary binary data as a string, use something like base64.


-1 is not a valid UTF-8 encoded character. [-17, -65, -67] is most likely the byte representation of the replacement character that gets substituted.


String isn't a container for binary data. It is a container for char. -1 isn't a legal value for a char. There's no reason why what you're doing should ever work. Ergo, don't do it.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号