开发者

EditText returns enhanced ISO-8859-1 instead UTF-8 encoding for german umlauts?

开发者 https://www.devze.com 2023-04-13 04:56 出处:网络
I am more than just confused. I do have some EditText, and it apparently returns ISO-8859-1 or even mixed 8859-1+UTF8 strings.

I am more than just confused. I do have some EditText, and it apparently returns ISO-8859-1 or even mixed 8859-1+UTF8 strings.

My understanding until now was, that Android is fully UTF-8, so this can't even happen.

Examples: Inputting "wüste" into EditText. A string to hex returns this byte code: 57 fc 73 74 65, my expectation would be: 57 c3bc 73 74 65

Inputting "wüste テスト" returns 57 fc 73 74 65 20 30c6 30b9 3开发者_JS百科0c8, which now even is a mix of extended 8859-1 and UTF-8.

Is this the expected and wanted behaviour? Can I change that somewhere? I realized this behaviour when sending data using JSON to a server, and that one bailed out because of illegal UTF-8 chars.

Regards, Oliver


Java (and therefore Android) strings are not UTF-8, but UTF-16. The bytes displayed are Unicode code points.

You'll need to convert your string to UTF-8 in order to send it as such (either directly, or via any JSON library you might be using). This can be done by calling getBytes("UTF8") on your string to get a byte array with the string in the desired encoding.

0

精彩评论

暂无评论...
验证码 换一张
取 消