开发者

Convert UCS-2 characters to UTF-8 Using C#

开发者 https://www.devze.com 2023-02-02 03:43 出处:网络
I\'m pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as

I'm pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I'm sending it out over the web. Currently, I have the following code to convert:

SqlString dbString = resultReader.GetSqlString(0);
byte[] dbBytes = dbString.GetUnicodeBytes();
byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode, 
    System.Text.Encoding.UTF8, dbBytes);
System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding();
string outputString = encoder.GetString(utf8Bytes);

However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to.

What am I missing?

EDIT: In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example:

OutputControl.Text = "カルフォルニア工科大学とチューリッヒ工科大学は共同で、太陽光を保管可能な燃料に直接変えることのできる装置の開発に成功したとのこと";

works. Here, OutputControl is an ASP.Net Literal. However,

OutputControl.Text = outputString; //Output from above snippet

results in mangled output as described above. My hypothesis was that the database's output was somehow getting mangled by ASP.Net. If that's not the case, then what are some other possibilities?

EDIT 2: Okay, I'm stupid. It turns out that there's nothing wrong with the database at all. When I tried inserting my own literal double byte characters (材料,原料;木料), I could read and output them just fine even witho开发者_JS百科ut any conversion process at all. It seems to me that whatever is inserting the data into the DB is mangling the characters somehow, so I'm going to look at that. With my verified, "clean" data, the following code works:

OutputControl.Text = dbString.ToString();

as the responses below indicate it should.


Your code does essentially the same as:

SqlString dbString = resultReader.GetSqlString(0);
string outputString = dbString.ToString();

string itself is a UNICODE string (specifically, UTF-16, which is 'almost' the same as UCS-2, except for codepoints not fitting into the lowest 16 bits). In other words, the conversions you are performing are redundant.

Your web app most likely mangles the encoding somewhere else as well, or sets a wrong encoding for the HTML output. However, that can't be diagnosed from the information you provided so far.


String in .net is 'encoding agnostic'.

You can convert bytes to string using a particular encoding to tell .net how to interprets your bytes.

You can convert string to bytes using a particular encoding to tell .net how you want your bytes served.

But trying to convert a string to another string using encodings makes no sens at all.

0

精彩评论

暂无评论...
验证码 换一张
取 消