I want to retrieve some html that is encoded in KOI8_R. How can I retrive it without corrupting the characters?
import java.io.*;
import java.net.URL;
import java.net.URLConnection;
public class htmlget {
public static void main(String[] args) throws Exception {
开发者_运维知识库String test = "http://koi8.pp.ru/";
URL website = new URL(test);
URLConnection yc = website.openConnection();
StringBuilder fileData = new StringBuilder(1000);
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream(),"KOI8_R"));
char[] buf = new char[1024];
int numRead=0;
while((numRead=in.read(buf)) != -1){
fileData.append(buf, 0, numRead);
}
in.close();
String text = fileData.toString();
BufferedWriter out = new BufferedWriter(
new OutputStreamWriter(new FileOutputStream("foo.txt"),"KOI8_R"));
out.write(text);
OutputStreamWriter wrt = new OutputStreamWriter(System.out, "KOI8_R");
wrt.write(text);
wrt.close();
out.close();
}
}
The console and the file display Russian characters as "ÓÅÇÏÄÎÑ"
(...)
in.close();
String text = new String(fileData.toString().getBytes(), "KOI8_R");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream("foo.txt"), "KOI8_R"));
out.write(text);
(...)
精彩评论