I'm trying to fix a problem in an Android app. The app posts a HTTP request to a web service. When the text in the request contains the Swedish characters Å, Å and Ö, it doesn't work. The people that have the web service say that it's because the request has to be encoded in UTF-8, and they say it's not.
The app uses org.apache.http.impl.client.DefaultHttpClient, and I assume this line says that UTF-8 should be used: HttpProtocolParams.setContentCharset(params, "UTF-8");
I used Wireshark to see what the app sends, and the string "TeståäöÅÄÖéüà" is shown as: "Test\345\344\366\305\304\326\351\374\340"
I found out by this table that the numbers are the octal representation of the "Unicode code point" for the characters. That's something else than UTF-8, right?
Is it so that if it was UTF-8, the special characters would be represented by two bytes, e.g. "c3 a5" for "å" and "c3 a4" for "ä"?
So:
1. Do I understand it right with the Unicode vs UTF-8? 2. Am I right about that what's being sent is NOT in UTF-8-encoding? 3. How do I make the DefaultHttpClient send in UTF-8? 开发者_运维技巧Jon
As pointed out by Stephen, you must distinguish between the encoding used in the http header (for the url) and the request body.
Anyway, the distinction is not between Unicode vs UTF-8 , UTF-8 is one of the charset encodings for UNICODE (UTF-16 is another).
And you are not using Unicode, aparently, but old Latin1 (ISO 8859-1) : one byte for each character. It just happens that the first 128 Unicode codepoints coincide (roughly) with the positions used by Latin1.
Do yourself a favour and read the basics about Unicode, it should take you one or two days, it's very valuable and necesary knowledge for any programmer today (and tomorrow) .
精彩评论