开发者

'£' character does not seem to encode properly - expect '%a3' but get '%u00a3'

开发者 https://www.devze.com 2022-12-25 05:50 出处:网络
I want to send The pound sign character i.e. \'£\' encoded as ISO-8859-1 across the wire. I perform this by doing the 开发者_如何转开发following:

I want to send The pound sign character i.e. '£' encoded as ISO-8859-1 across the wire. I perform this by doing the 开发者_如何转开发following:

var _encoding = Encoding.GetEncoding("iso-8859-1");
var _requestContent = _encoding.GetBytes(requestContent);
var _request = (HttpWebRequest)WebRequest.Create(target);

_request.Headers[HttpRequestHeader.ContentEncoding] = _encoding.WebName;
_request.Method = "POST";
_request.ContentType = "application/x-www-form-urlencoded; charset=iso-8859-1";
_request.ContentLength = _requestContent.Length;

_requestStream = _request.GetRequestStream();
_requestStream.Write(_requestContent, 0, _requestContent.Length);
_requestStream.Flush();
_requestStream.Close();

When I put a breakpoint at the target, I expect to receive the following: '%a3', however I receive '%u00a3' instead. We have tested many odd characters, but '£' seems to be the only character where theres a problem.

Does anyone know what the problem is here? - Help would be greatly appreciated...

Billy


From what I can see, they are equivalent. If the server chokes, well then the server probably does not support escaped Unicode.


From a bit of research Ì found:

ISO-8859-1 is divided into 2 groups of characters: (ref: http://en.wikipedia.org/wiki/ISO_8859-1)

The lower range 20 to 7E - where all characters seem to be encoded correctly The higher range A0 to FF - where all characters seem to encode to their Unicode equivalent value

As '£' is in higher range A0 to FF, it gets encoded to %u00a3. In fact when I use the first few characters of the higher range A0 to FF i.e. '¡¢£¤¥¦§¨©ª«¬®', I get '%u00a1%u00a2%u00a3%u00a4%u00a5%u00a6%u00a7%u00a8%u00a9%u00aa%u00ab%u00ac%u00ae'. This behaviour is consistent.

The question I now have is why do characters in the higher range A0 to FF get encoded to their unicode value - and not to their equivalent ISO-8859-1 value?

%u00a1%u00a2%u00a3%u00a4%u00a5%u00a6%u00a7%u00a8%u00a9%u00aa%u00ab%u00ac+%u00ae

0

精彩评论

暂无评论...
验证码 换一张
取 消