开发者

C# Remove accent from character?

开发者 https://www.devze.com 2023-03-11 19:39 出处:网络
How can I convert á to a in C#?开发者_JAVA技巧 For instance: aéíúö => aeiuo Um, having read those threads [I didn\'t know they were called diatrics, so I couldn\'t possible search for that].

How can I convert á to a in C#?

开发者_JAVA技巧

For instance: aéíúö => aeiuo

Um, having read those threads [I didn't know they were called diatrics, so I couldn't possible search for that].

I want to "drop" all diatrics but ñ

Currently I have:

public static string RemoveDiacritics(this string text)
{
    string normalized = text.Normalize(NormalizationForm.FormD);
    var sb = new StringBuilder();

    foreach (char c in from c in normalized
                       let u = CharUnicodeInfo.GetUnicodeCategory(c)
                       where u != UnicodeCategory.NonSpacingMark
                       select c)
    {
        sb.Append(c);
    }

    return sb.ToString().Normalize(NormalizationForm.FormC);
}

What would be the best way to leave ñ out of this?

My solution was to do the following after the foreach:

var result = sb.ToString();

if (text.Length != result.Length)
    throw new ArgumentOutOfRangeException();

int position = -1;
while ((position = text.IndexOf('ñ', position + 1)) > 0)
{
    result = result.Remove(position, 1).Insert(position, "ñ");
}

return sb.ToString();

But I'd assume there is a less "manual" way to do this?


if you don´t want remove the ñ, this is a option. It´s fast.

    static string[] pats3 = { "é", "É", "á", "Á", "í", "Í", "ó", "Ó", "ú", "Ú" };
    static string[] repl3 = { "e", "E", "a", "A", "i", "I", "o", "O", "u", "U" };
    static Dictionary<string, string> _var = null;
    static Dictionary<string, string> dict
    {
        get
        {
            if (_var == null)
            {
                _var = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value);
            }

            return _var;
        }
    }
    private static string RemoveAccent(string text)
    {
        // using Zip as a shortcut, otherwise setup dictionary differently as others have shown
        //var dict = pats3.Zip(repl3, (k, v) => new { Key = k, Value = v }).ToDictionary(o => o.Key, o => o.Value);

        //string input = "åÅæÆäÄöÖøØèÈàÀìÌõÕïÏ";
        string pattern = String.Join("|", dict.Keys.Select(k => k)); // use ToArray() for .NET 3.5
        string result = Regex.Replace(text, pattern, m => dict[m.Value]);

        //Console.WriteLine("Pattern: " + pattern);
        //Console.WriteLine("Input: " + text);
        //Console.WriteLine("Result: " + result);

        return result;
    }

If you want remove the ñ, the faster option is: Encoding.ASCII.GetString(Encoding.GetEncoding("Cyrillic").GetBytes(text));

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号