The thing which I want to ask is pretty simple. I have an HTML document which is hosted in a webbrowser control.
Now, when I select a Korean word using the MSHTML range property, I am able to get
range.htmlText
and range.Text
. They both show the Korean word. All I want to do is to convert it to unicode format.
Is it possible?
FYI I am doi开发者_Python百科ng all this using C# WinForms.
Could you provide a little more information? What format is the "Korean word" in when you read it? (I assume the same as the HTML document header.) Could you post a sample HTML page from which you are trying to read?
If the problem is that the string you are getting simply is in a different code page, you can use the Encoding classes in .Net to convert it. For example, perhaps your text is in iso-2022-kr. Here is a sample to convert your string, called "stringInKoreanIsoEncoding" in the code below:
Encoding koreanEncoding = Encoding.GetEncoding(50225); // 50225 is the code page for iso-2022-kr
byte[] convertedToUtf8 = Encoding.Convert(koreanEncoding, Encoding.UTF8, koreanEncoding.GetBytes(stringInKoreanIsoEncoding));
string utf8String = Encoding.UTF8.GetString(convertedToUtf8);
精彩评论