开发者

Download UTF-8 web page into String

开发者 https://www.devze.com 2023-03-26 10:54 出处:网络
This is a newbie question. I read the following question to download a web page whose c开发者_如何学编程ontents is coded in UTF-8. The page is then converted into a byte array, while I\'m using a Str

This is a newbie question.

I read the following question to download a web page whose c开发者_如何学编程ontents is coded in UTF-8. The page is then converted into a byte array, while I'm using a String to read contents from the page.

I need to turn UTF-8 into Latin1/ANSI since that's what RichText and MessageBox seem to use (I'm getting funny characters).

Is there a more direct way to donwload a UTF-8 page and convert it into ANSI/Latin1?

Thank you.


Edit: When callig MessageBox, accented characters are not shown as expected:

Content = CStr(e.Result)

'Théâtre, Métro MessageBox.Show(Content)


String in .NET uses unicode all the way, so you should not have to convert it to something. The important thing is that when you download the page, you need to make sure that you mark that you load the data from a UTF-8 source.

MSDN has a sample on loading UTF-8 encoded data into a string:

Private Function ReadAuthor(binary_file As Stream) As String
     Dim encoding As System.Text.Encoding = System.Text.Encoding.UTF8
     ' Read string from binary file with UTF8 encoding
     Dim buffer(30) As Byte
     binary_file.Read(buffer, 0, 30)
     Return encoding.GetString(buffer)
End Function

Update

When using WebClient.DownloadString the conversion to a string takes place automatically and code similar to the one above is not needed. The automatic conversion uses the encoding specified by WebClient.Encoding, so the problem should be solved by setting the WebClient object's encoding property to UTF-8:

client.Encoding = System.Text.Encoding.UTF8
0

精彩评论

暂无评论...
验证码 换一张
取 消