开发者

How to count the number of columns required by a japanese-english mixed string?

开发者 https://www.devze.com 2023-03-22 21:42 出处:网络
My string contains a mix of japanese (double width) and e开发者_JAVA技巧nglish (single width) characters:

My string contains a mix of japanese (double width) and e开发者_JAVA技巧nglish (single width) characters:

string str = "女性love";

In C#, my method has to count japanese characters as two columns and english characters as one. So that the above string should get me a 8 columns :

2 + 2 + 1 + 1 + 1 + 1 = 8


Probbaly you want something like this, very rough one, but by working a little bit on it you can make it much nicer:

    string str = "女性love";
    int iTotal = 0;

    str.ToList().ForEach(ch=>{
        int iCode = ch;
        if(iCode>= 65 && iCode <= 122)
            iTotal++;
        else 
            iTotal +=2;
    });

//65 is 'a', 122 is 'z'.  iTotal = 8 //in this case

Now what about why System.Text.Encoding.UTF8.GetBytes(str).Length returns 10, it simply cause UTF8 ecoding specification. Follow this link Joel on Unicode and read entire article. In particular here is most importnat stuff in regard of this question:

In UTF-8, every code point from 0-127 is stored in a single byte. Only code points 128 and above are stored using 2, 3, in fact, up to 6 bytes

Check your Japanese letters code points and you will figure out an aswer on why it returns 10.

EDIT

Pay attention that this code, actually separate English letters from "others", and not only from Japanese ones. If you need to filter only on Japanese ones, cause may be you need to deal with Arabic, Ebraic, Russian or whatever, you need to know limits, in terms of codes, of Japanese alphabet.

Regards.


Try something like this:

int bCnt = System.Text.Encoding.UTF8.GetBytes(str).Length; //Select the appropriate encoding, if not UTF8
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号