开发者

Conversion of character set in C# [duplicate]

开发者 https://www.devze.com 2023-03-01 21:10 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicate: How do I remove diacritics (accents) from a string in .NET?
This question already has answers here: Closed 11 years ago.

Possible Duplicate:

How do I remove diacritics (accents) from a string in .NET?

开发者_如何学GoOur project generates an string(Mērā nāma nitina hai) in web page and when we read it using Regex.match function then we get a string in which these special character are converted into some browser code like \&#\257(without backslash) in place of ā . So we want to convert it into 'a' or 'ā'. So that we can use it in further program. Thanks


Im not sure that my method is absolutely right but it works for me:

[EDIT]

string first = @"Mērā nāma nitina hai";
first = System.Web.HttpUtility.HtmlDecode(first);

byte[] ansi = System.Text.Encoding.Convert(Encoding.Unicode, Encoding.GetEncoding(1252), Encoding.Unicode.GetBytes(first));
string output = Encoding.Unicode.GetString(System.Text.Encoding.Convert(Encoding.GetEncoding(1252), Encoding.Unicode, ansi));
MessageBox.Show(output);

The main idea of this code - you are converting your string to ANSI and back to UNICODE. After this action all diacritics is gone away.


How about this:

var correctStr = HttpUtility.HtmlDecode(@"Mērā nāma nitina hai");

Explanation: ā is an html entity character representing the special accented char with unicode code 257.


You need to use the String.Normalize method.

0

精彩评论

暂无评论...
验证码 换一张
取 消