How to retrieve the unicode decimal representation of the chars in a string containing hindi text?_问答_开发者

How to retrieve the unicode decimal representation of the chars in a string containing hindi text?

开发者 https://www.devze.com 2023-03-03 10:40 出处：网络

I am using visual studio 2010 in c# for converting开发者_运维百科 text into unicodes. Like i have a string abc= \"मेरा\" .

I am using visual studio 2010 in c# for converting开发者_运维百科 text into unicodes. Like i have a string abc= "मेरा" . there are 4 characters in this string. i need all the four unicode characters. Please help me.

When you write a code like string abc= "मेरा";, you already have it as Unicode (specifically, UTF-16), so you don't have to convert anything. If you want to access the singular characters, you can do that using normal index: e.g. abc[1] is े (DEVANAGARI VOWEL SIGN E).

If you want to see the numeric representations of those characters, just cast them to integers. For example

abc.Select(c => (int)c)

gives the sequence of numbers 2350, 2375, 2352, 2366. If you want to see the hexadecimal representation of those numbers, use ToString():

abc.Select(c => ((int)c).ToString("x4"))

returns the sequence of strings "092e", "0947", "0930", "093e".

Note that when I said numeric representations, I actually meant their encoding using UTF-16. For characters in the Basic Multilingual Plane, this is the same as their Unicode code point. The vast majority of used characters lie in BMP, including those 4 Hindi characters presented here.

If you wanted to handle characters in other planes too, you could use code like the following.

byte[] bytes = Encoding.UTF32.GetBytes(abc);

int codePointCount = bytes.Length / 4;

int[] codePoints = new int[codePointCount];

for (int i = 0; i < codePointCount; i++)
    codePoints[i] = BitConverter.ToInt32(bytes, i * 4);

Since UTF-32 encodes all (21-bit) code points directly, this will give you them. (Maybe there is a more straightforward solution, but I haven't found one.)

Since a .Net char is a Unicode character (at least, for the BMP code point), you can simply enumerate all characters in a string:

var abc = "मेरा";

foreach (var c in abc)
{
    Console.WriteLine((int)c);
}

resulting in

use

System.Text.Encoding.UTF8.GetBytes(abc)

that will return your unicode values.

If you are trying to convert files from a legacy encoding into Unicode:

Read the file, supplying the correct encoding of the source files, then write the file using the desired Unicode encoding scheme.

    using (StreamReader reader = new StreamReader(@"C:\MyFile.txt", Encoding.GetEncoding("ISCII")))
    using (StreamWriter writer = new StreamWriter(@"C:\MyConvertedFile.txt", false, Encoding.UTF8))
    {
        writer.Write(reader.ReadToEnd());
    }

If you are looking for a mapping of Devanagari characters to the Unicode code points:

You can find the chart at the Unicode Consortium website here.

Note that Unicode code points are traditionally written in hexidecimal. So rather than the decimal number 2350, the code point would be written as U+092E, and it appears as 092E on the code chart.

If you have the string s = मेरा then you already have the answer.

This string contains four code points in the BMP which in UTF-16 are represented by 8 bytes. You can access them by index with s[i], with a foreach loop etc.

If you want the underlying 8 bytes you can access them as so:

string str = @"मेरा";
byte[] arr = System.Text.UnicodeEncoding.GetBytes(str);

How to retrieve the unicode decimal representation of the chars in a string containing hindi text?

精彩评论

关注公众号

热门标签

图文推荐

How to retrieve the unicode decimal representation of the chars in a string containing hindi text?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：