开发者

C#, UTF-8 and encoding characters

开发者 https://www.devze.com 2023-02-07 21:40 出处:网络
This is a shot-in-the-dark, and I apologize in advance if this question sounds like the ramblings of a madman.

This is a shot-in-the-dark, and I apologize in advance if this question sounds like the ramblings of a madman.

As part of an integration with a third party, I need to UTF8-encode some string info using C# so I can send it to the target server via multipart form. The problem is that they are rejecting some of my submissions, probably because I'm not encoding their contents correctly.

Right now, I'm trying to figure out how a dash or hyphen -- I can't tell which it is just by looking at it -- is received or interpreted by the target server as ?~@~S (yes, that's a 5-character string and is not your browser glitching out). And unfortunately I don't have a thorough enough understanding of Encoding.UTF8.GetBytes() to know how to use the byte array to begin identifying where the problem might lie.

If anybody can provide any tips or advice, I would greatly appreciate it.开发者_Python百科 So far my only friend has been MSDN, and not much of one at that.

UPDATE 1: After some more digging around, I discovered that using System.Web.HttpUtility.UrlEncode()to encode an EM DASH character ("—") will hex-encode it into "%e2%80%94".

I'm currently sending this info in aHttpWebRequestpost, with a content type of "application/x-www-form-urlencoded" -- could this be what's causing the problem? And if so, what is the proper way to encode a series of name-value pairs whose values may contain Unicode characters, such that it will be understood by a server expecting a UTF-8 request?


byte[] test = System.Text.Encoding.UTF8.GetBytes("-");

Should give you

test[0] = 0x2D (45 as integer).  

Verify that your sending 0x2D to the target server.


You may need to add a "charset=utf-8" parameter to your Content-Type header. You may also want to have a Content-Encoding header to set your encoding. The headers should contain the following:

Content-Type: multipart/form-data; charset=utf-8

Otherwise, the web server won't know your bytes are UTF-8 bytes, so it will misinterpret them.

0

精彩评论

暂无评论...
验证码 换一张
取 消