I am unzipping a huge gz file in java, the gz file is about 2 gb and the unzipped file is about 6 gb. from time to time it the unzipping process would take forever(hours), sometimes it finishes in reasonable time(like under 10 min or quicker).
I have a fairly powerful box(8GB ram, 4-cpu), is there a way to improve the code below? or use a completely different library? Also I used Xms256m and Xmx4g to the vm.public static File unzipGZ(File file, File outputDir) {
GZIPInputStream in = null;
OutputStream out = null;
File target = null;
try {
// Open the compressed file
in = new GZIPInputStream(new FileInputStream(file));
// Open the output file
target = new File(outputDir, FileUtil.stripFileExt(file.getName()));
out = new FileOutputStream(target);
// Transfer bytes from the compressed file to the output file
byte[] buf = new byte[1024];
int len;
while ((len = in.read(buf)) > 0) {
out.write(buf, 0, len);
}
// Close the file and stream
in.close();
out.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (in != null) {
try {
in.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
if (out != n开发者_运维百科ull) {
try {
out.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
return target;
}
I don't know how much buffering is applied by default, if any - but you might want to try wrapping both the input and output in a BufferedInputStream
/ BufferedOutputStream
. You could also try increasing your buffer size - 1K is a pretty small buffer. Experiment with different sizes, e.g. 16K, 64K etc. These should make the use of BufferedInputStream
rather less important, of course.
On the other hand, I suspect this isn't really the problem. If it sometimes finishes in 10 minutes and sometimes takes hours, that suggests something very odd is going on. When it takes a very long time, is it actually making progress? Is the output file increasing in size? Is it using significant CPU? Is the disk constantly in use?
One side note: as you're closing in
and out
in finally blocks, you don't need to do it in the try
block as well.
If you have 8 gigs of RAM, and the input file is on 2 gigs, you could try to use a memory mapped file. Here is an example on how to do it.
Try to use channels from java.nio, have a method to transfer bytes from an to other file channels. Then you don't have to copy them yourself. And that will probably be quite optimized. See FileInputStream.getChannel()
精彩评论