开发者

Conversion from UTF8 to ASCII

开发者 https://www.devze.com 2023-01-28 16:14 出处:网络
I have a text read from a XML file stored in UTF8 encoding. C# reads it perfectly, I checked with the debugger, but when I try to convert it to ASCII to save it in another file I get a ? char in place

I have a text read from a XML file stored in UTF8 encoding. C# reads it perfectly, I checked with the debugger, but when I try to convert it to ASCII to save it in another file I get a ? char in places where there was a conflicting character. For instance, this text:

string s = "La introducción masiva de las nuevas tecnologías de la información";

Will be saved as

"La introducci?n masiva de las nuevas tecnolog?as de la informaci?n"

I cannot just replace them for their latin (a, e, i, o, u) vowels because some words in spanish would miss the sense. I've already tried this and this questions with no sucess. So Im hoping someone can help me. The selected answer in the second one didnt even compiled...!

In case someone wants to take a look, my code is this one:

private void WriteInput( string input )
{
   byte[] byteArray = Encoding.UTF8.GetBytes(input);
   byte[] asciiArray = Encoding.Convert(Encoding.UTF8, Encoding.ASCII, byteArray);
   string finalString = Encoding.ASCII.GetString(asciiArray);

   string inputFile = _idFile + ".in";
   var batchWriter = new StreamWriter(inputFile, false, Encoding.ASCII);
   batchWriter.Write(finalString);
   batchWriter.开发者_StackOverflow中文版Close();
}


Those characters have no mapping in ASCII. Review an ASCII table, like Wikipedia's, to verify this. You might be interested in the Windows 1252 encoding, or "extended ASCII", as it's sometimes called, which has code points for many accented characters, Spanish included.

var input = "La introducción masiva de las nuevas tecnologías de la información";
var utf8bytes = Encoding.UTF8.GetBytes(input);
var win1252Bytes = Encoding.Convert(
                Encoding.UTF8, Encoding.GetEncoding("windows-1252"), utf8bytes);
File.WriteAllBytes(@"foo.txt", win1252Bytes);


Can't be done. ASCII does not have those letters, so the best you can do is to URL-encode or unicode-escape-encode them.

0

精彩评论

暂无评论...
验证码 换一张
取 消