开发者

retrieving html in KOI8_R

开发者 https://www.devze.com 2023-03-23 09:10 出处:网络
I want to retrieve some html that is encoded in KOI8_R. How can I retrive it without corrupting the characters?

I want to retrieve some html that is encoded in KOI8_R. How can I retrive it without corrupting the characters?

import java.io.*;
import java.net.URL;
import java.net.URLConnection;

public class htmlget {

  public static void main(String[] args) throws Exception {
开发者_运维知识库String test = "http://koi8.pp.ru/";
      URL website = new URL(test);
         URLConnection yc = website.openConnection();
         StringBuilder fileData = new StringBuilder(1000);
         BufferedReader in = new BufferedReader(
                                 new InputStreamReader(
                                 yc.getInputStream(),"KOI8_R"));

         char[] buf = new char[1024];
         int numRead=0;
         while((numRead=in.read(buf)) != -1){
             fileData.append(buf, 0, numRead);
         }
         in.close();

        String text = fileData.toString();
        BufferedWriter out = new BufferedWriter(
                new OutputStreamWriter(new FileOutputStream("foo.txt"),"KOI8_R"));      
    out.write(text);
         OutputStreamWriter wrt = new OutputStreamWriter(System.out, "KOI8_R");
                 wrt.write(text);
                 wrt.close();
                 out.close();
}

}

The console and the file display Russian characters as "ÓÅÇÏÄÎÑ"


(...)
        in.close();

        String text = new String(fileData.toString().getBytes(), "KOI8_R");
        BufferedWriter out = new BufferedWriter(new OutputStreamWriter(
                new FileOutputStream("foo.txt"), "KOI8_R"));
        out.write(text);
(...)
0

精彩评论

暂无评论...
验证码 换一张
取 消