开发者

Convert File with known encoding to UTF-8

开发者 https://www.devze.com 2023-01-29 04:07 出处:网络
I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse).

I need to convert text file to the String, which, finally, I should put as an input parameter (type InputStream) to IFile.create (Eclipse). Looking for the example or how to do that but still can not figure out...need your help!

just for testing, I did try to convert original text file to UTF-8 encoded with this code

FileInputStream fis = new FileInputStream(FilePath);
InputStreamReader isr = new InputStreamReader(fis);

Reader in = new BufferedReader(isr);
StringBuffer buffer = new StringBuffer();

int ch;
while ((ch = in.read()) > -1) {
    buffer开发者_开发技巧.append((char)ch);
}
in.close();


FileOutputStream fos = new FileOutputStream(FilePath+".test.txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(buffer.toString());
out.close();

but even thought the final *.test.txt file has UTF-8 encoding, the characters inside are corrupted.


You need to specify the encoding of the InputStreamReader using the Charset parameter.

                                    // ↓ whatever the input's encoding is
Charset inputCharset = Charset.forName("ISO-8859-1");
InputStreamReader isr = new InputStreamReader(fis, inputCharset));

This also works:

InputStreamReader isr = new InputStreamReader(fis, "ISO-8859-1"));

See also:

  • InputStreamReader(InputStream in, Charset cs)
  • Charset.forName(String charsetName)
  • Java: How to determine the correct charset encoding of a stream
  • How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII
  • GuessEncoding - only works for UTF-8, UTF-16LE, UTF-16BE, and UTF-32 ☹
  • ICU Charset Detector
  • cpdetector, free java codepage detection
  • JCharDet (Java port of Mozilla charset detector) ironically, that page does not render the apostrophe in "Mozilla's" correctly

SO search where I found all these links: https://stackoverflow.com/search?q=java+detect+encoding


You can get the default charset - which is comes from the system the JVM is running on - at runtime via Charset.defaultCharset().

0

精彩评论

暂无评论...
验证码 换一张
取 消