.NET string replace russian to english_问答_开发者

开发者 https://www.devze.com 2022-12-30 21:08 出处：网络

I have a strange problem replacing chars in string... I read a .txt file containing russian text, and starting from a list of letters russian to english (ru=en), I loop the list and I WOULD like to r

I have a strange problem replacing chars in string...

I read a .txt file containing russian text, and starting from a list of letters russian to english (ru=en), I loop the list and I WOULD like to replace russian characters with english characters.

The problem is: I can see in the debug the right reading of the russian and the right reading of th开发者_开发知识库e english, but using myWord = myWord.Replace(ruChar, enChar) the string is not replaced.

My txt file is a UTF-8 encoding.

String.Replace() is going to be horribly inefficient, you'll have to call it for each possible Cyrillic letter you'd want to replace. Use a Dictionary instead (no pun intended). For example:

    private const string Cyrillic = "AaБбВвГг...";
    private const string Latin = "A|a|B|b|V|v|G|g|...";
    private Dictionary<char, string> mLookup;

    public string Romanize(string russian) {
        if (mLookup == null) {
            mLookup = new Dictionary<char, string>();
            var replace = Latin.Split('|');
            for (int ix = 0; ix < Cyrillic.Length; ++ix) {
                mLookup.Add(Cyrillic[ix], replace[ix]);
            }
        }
        var buf = new StringBuilder(russian.Length);
        foreach (char ch in russian) {
            if (mLookup.ContainsKey(ch)) buf.Append(mLookup[ch]);
            else buf.Append(ch);
        }
        return buf.ToString();
    }

Note how the bars and the Split() function are necessary in the Latin replacement because some Cyrillic letters require more than one letter for their transliteration. Key idea is to use a dictionary for fast lookup and a string builder for fast string construction.

This United Nations document might be helpful.

Don't -1 me if this doesnt work, I'm just guessing that you must UTF-8 English string that you want to replace, like so for example:

string myWord = Encoding.UTF8.GetString(Encoding.ASCII.GetBytes(myWord));
myWord = myWord.Replace("слово", Encoding.UTF8.GetString(Encoding.ASCII.GetBytes("letter")));

I'm assuming that myWord is in ASCII so the first line of code converts it to UTF-8 string, but left it out if it is UTF-8.

Second line converts English word to UTF-8 so it can be replaced over the Russian word.

Very strange

Console.WriteLine("слово".Replace("слово", "word")); // prints 'word'

Works as planned. Maybe because I have set Russian as non-unicode system language..

.NET string replace russian to english

精彩评论

关注公众号

热门标签

图文推荐

.NET string replace russian to english

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：