开发者

unzip huge gz file in Java and performance

开发者 https://www.devze.com 2023-02-10 14:03 出处:网络
I am unzipping a huge gz file in java, the gz file is about 2 gb and the unzipped file is about 6 gb.from time to time it the unzipping process would take forever(hours), sometimes it finishes in reas

I am unzipping a huge gz file in java, the gz file is about 2 gb and the unzipped file is about 6 gb. from time to time it the unzipping process would take forever(hours), sometimes it finishes in reasonable time(like under 10 min or quicker).

I have a fairly powerful box(8GB ram, 4-cpu), is there a way to improve the code below? or use a completely different library?

Also I used Xms256m and Xmx4g to the vm.

public static File unzipGZ(File file, File outputDir) {
    GZIPInputStream in = null;
    OutputStream out = null;
    File target = null;
    try {
        // Open the compressed file
        in = new GZIPInputStream(new FileInputStream(file));

        // Open the output file
        target = new File(outputDir, FileUtil.stripFileExt(file.getName()));
        out = new FileOutputStream(target);

        // Transfer bytes from the compressed file to the output file
        byte[] buf = new byte[1024];
        int len;
        while ((len = in.read(buf)) > 0) {
            out.write(buf, 0, len);
        }

        // Close the file and stream
        in.close();
        out.close();
    } catch (IOException e) {
        e.printStackTrace();
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
        if (out != n开发者_运维百科ull) {
            try {
                out.close();
            } catch (IOException e) {
                // TODO Auto-generated catch block
                e.printStackTrace();
            }
        }
    }
    return target;
}


I don't know how much buffering is applied by default, if any - but you might want to try wrapping both the input and output in a BufferedInputStream / BufferedOutputStream. You could also try increasing your buffer size - 1K is a pretty small buffer. Experiment with different sizes, e.g. 16K, 64K etc. These should make the use of BufferedInputStream rather less important, of course.

On the other hand, I suspect this isn't really the problem. If it sometimes finishes in 10 minutes and sometimes takes hours, that suggests something very odd is going on. When it takes a very long time, is it actually making progress? Is the output file increasing in size? Is it using significant CPU? Is the disk constantly in use?

One side note: as you're closing in and out in finally blocks, you don't need to do it in the try block as well.


If you have 8 gigs of RAM, and the input file is on 2 gigs, you could try to use a memory mapped file. Here is an example on how to do it.


Try to use channels from java.nio, have a method to transfer bytes from an to other file channels. Then you don't have to copy them yourself. And that will probably be quite optimized. See FileInputStream.getChannel()

0

精彩评论

暂无评论...
验证码 换一张
取 消