开发者

String received with utf8 format doesn't get displayed correctly

开发者 https://www.devze.com 2022-12-30 04:59 出处:网络
I want to know how to receive the string from a file in Java which has different lang开发者_Python百科uage letters.

I want to know how to receive the string from a file in Java which has different lang开发者_Python百科uage letters.

I used UTF-8 format. This can receive some language letters correctly, but Latin letters can't be displayed correctly.

So, how can I receive all language letters?

Alternatively, is there any other format which will allow me to receive all language letters.

Here's my code:

URL url = new URL("http://google.cm");

URLConnection urlc = url.openConnection();
BufferedReader buffer = new BufferedReader(new InputStreamReader(urlc.getInputStream(), "UTF-8")); 
StringBuilder builder = new StringBuilder(); 
int byteRead; 
while ((byteRead = buffer.read()) != -1)
{ 
    builder.append((char) byteRead);
} 

buffer.close();

text=builder.toString();

If I display the "text", the letters can't be displayed correctly.


Reading a UTF-8 file is fairly simple in Java:

Reader r = new InputStreamReader(new FileInputStream(filename), "UTF-8"); 

If that isn't working, the issue lies elsewhere.

EDIT: According to iconv, Google Cameroon is serving invalid UTF-8. It seems to actually be iso-8859-1.

EDIT2: Actually, I was wrong. It serves (and declares) valid UTF-8 if the user agent contains "Mozilla/5.0" (or higher), but valid iso-8859-1 in (some) other cases. Obviously, the best bet is to use getContentType to check before decoding.

0

精彩评论

暂无评论...
验证码 换一张
取 消