开发者

String replace diacritics in C# [duplicate]

开发者 https://www.devze.com 2022-12-25 05:21 出处:网络
This question already has answers here: 开发者_C百科How do I remove diacritics (accents) from a string in .NET?
This question already has answers here: 开发者_C百科 How do I remove diacritics (accents) from a string in .NET? (22 answers) Closed 9 years ago.

I'd like to use this method to create user-friendly URL. Because my site is in Croatian, there are characters that I wouldn't like to strip but replace them with another. For example, this string:

ŠĐĆŽ šđčćž

needs to be:

sdccz-sdccz

So, I would like to make two arrays, one that will contain characters that are to be replaced and other array with replacement characters:

string[] character = { "Š", "Đ", "Č", "Ć", "Ž", "š", "đ", "č", "ć", "ž" };
string[] characterReplace = { "s", "d", "c", "c", "z", "s", "d", "c", "c", "z" };

Finally, this two arrays should be use in some method that will take string, find matches and replace them. In php I used preg_replace function to deal with this. In C# this doesn't work:

s = Regex.Replace(s, character, characterReplace);

Would appreciate if someone could help.


It seems you want to strip off diacritics and leave the base character. I'd recommend Ben Lings's solution here for this:

string input = "ŠĐĆŽ šđčćž";
string decomposed = input.Normalize(NormalizationForm.FormD);
char[] filtered = decomposed
    .Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
    .ToArray();
string newString = new String(filtered);

Edit: Slight problem! It doesn't work for the Đ. The result is:

SĐCZ sđccz


Jon Skeet mentioned the following code on a newsgroup...

static string RemoveAccents (string input)
{
    string normalized = input.Normalize(NormalizationForm.FormKD);
    Encoding removal = Encoding.GetEncoding(Encoding.ASCII.CodePage,
                                            new EncoderReplacementFallback(""),
                                            new DecoderReplacementFallback(""));
    byte[] bytes = removal.GetBytes(normalized);
    return Encoding.ASCII.GetString(bytes);
}

EDIT

Maybe I am crazy, but I just ran the following...

Dim Input As String = "ŠĐĆŽ-šđčćž"
Dim Builder As New StringBuilder()

For Each Chr As Char In Input
    Builder.Append(Chr)
Next

Console.Write(Builder.ToString())

And the output was SDCZ-sdccz


A dictionary would be a logical solution to this...

Dictionary<char, char> AccentEquivelants = new Dictionary<char, char>();
AccentEquivelants.Add('Š', 's');
//...add other equivelents

string inputstring = "";
StringBuilder FixedString = new StringBuilder(inputstring);
for (int i = 0; i < FixedString.Length; i++)
    if (AccentEquivelants.ContainsKey(FixedString[i]))
        FixedString[i] = AccentEquivelants[FixedString[i]];
return FixedString.ToString();

You need to use a StringBuilder when doing string operations like this because strings in C# are immutable, so changing a character at a time will create several string objects in memory, whereas StringBuilders are mutable and do not have this drawback.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号