开发者

How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?

开发者 https://www.devze.com 2023-03-09 19:01 出处:网络
I am sorry if it has been asked before. I am trying to process a text file using Java. The text file is exported from MS SQLServer. When I open it in PSPad (sort of text editor in which I can view any

I am sorry if it has been asked before. I am trying to process a text file using Java. The text file is exported from MS SQLServer. When I open it in PSPad (sort of text editor in which I can view any file in hex format), it tells me that my text file is in UTF-16LE. Since I am getting it from someone else, it is quite possible.

Now my Java program is not able to deal with that format. So I wanted to know if there is any way by which I can either convert my text file in ASCII format or do some preprocessing or anything? I CAN modify the file.

Any help is greatly appreciated.

Thanks.

EDIT 1

I wrote this program, but it is not working as expected. If I see the output file in PSPad, I can see each character as a 2-byte char, e.g. '2' is 3200 instead of just 32; 'M' is 4D00 instead of just 4D, etc. The though says the encoding of output file is UTF-8. I am kind of confused here. Can anyone tell me what am I doing wrong?

public static void main(String[] args) throws Exception {

        try {
            // Open the file that is the first
            // command line parameter
            FileInputStream fstream = new FileInputStream(
                    "input.txt");
            // Get the object of DataInputStream
            DataInputStream in = new DataInputStream(fstream);
            BufferedReader br = new BufferedReader(new InputStreamReader(in,"UTF-16LE"));
            String strLine;
            // Read File Line By Line
            while ((strLine = br.readLine()) != null) {
                // Write to the file
                writeToFile(strLine);
            }
            // Close the input stream
            in.close();
        } catch (Exception e) {// Catch exception if any
            System.err.println("Error: " + e.getMessage());
        }

        System.out.println("done.");
    }

    static public void writeToFile(String str) {
        try {
            OutputStreamWriter writer = new OutputStreamWriter(new FileOutputStream("output.txt", true), "UTF-8");
            BufferedWriter fbw = new BufferedWriter(writer);
            fbw.write(str);
            fbw.close();
        } catch (Exception e) {// Catch exception if any
            System.err.println("Er开发者_C百科ror: " + e.getMessage());
        }
    } 

EDIT 2

Here are the snapshots:

input file in PSPad (a free hex viewer)

How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?

output file in PSPad

How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?

this is what i was expecting to see:

How to deal with UTF-16LE encoded text file using Java? or convert it to ASCII?


Create an InputStreamReader for charset UTF-16LE and you will be all set.


InputStreamReader will let you load your UTF-16EL in memory. You can then perform all string manipulations you need. Then, you can save into ASCII format using OutputStreamWriter. Use CharSet to select formats.


Just found a solution.

http://www.fileformat.info/convert/text/utf2utf.htm

Lets you upload and convert between the encodings.

Its not a permanent solution though, since my file is 700MB+. So I will try out some solutions posted by others.

This small software helps:

http://www.kalytta.com/tools.php

0

精彩评论

暂无评论...
验证码 换一张
取 消