开发者

Fastest compression for serialzable files in Java

开发者 https://www.devze.com 2023-02-18 18:50 出处:网络
I have a bunch of files (around 4000), each weighting 1-5K more or less, all created using the serialization mechanism of Java.

I have a bunch of files (around 4000), each weighting 1-5K more or less, all created using the serialization mechanism of Java.

I'd like to compress them and send them over a network as a single file. (They total for around 200-300MB).

I'm looking for a way to increase the compression / decompression speed, w开发者_StackOverflow中文版ithout hurting the file size too much (as it should still be sent over the network and get stored in the server).

Currently using the zip package that comes with Apache Ant. I read that zip files store meta data for each file, so I guess zip files won't be the best choice here.

So what's preferable? Gzip / Tar? Or not compressing at all? Which java library would you recommend for this case?

Thanks in advance.


Not compressing at all would be fastest, but the resulting file size is the downside.

One reason why tar.gz produces smaller file sizes than zip alone is that gzip gets to work with a bigger buffer of data (the whole tar file), while in your case, zip only gets to work with the data from one file at a time (usually a lot less than the size of the tar file, if there are a lot of files).

So gzip gets to compress an entire book with chapters of pages at a time, while zip compresses each chapter of a book and then wraps the compressed chapters up in a book - i.e. compressed collection of objects is usually smaller than a collection of compressed objects.

To produce a similar result to tar.gz, you can zip up the files in the first pass using the 'store'algorithm, and then zip up the resulting zip file using the default deflate algorithm.


A lot depends on the network that you are using. If its over the internet - you might be better off sending as (say) 50 zipped up files rather than one file. If you transfer the data in one file and the file copy fails - you will have to send it again.

Copying as separate files will allow you to transfer some in parallel and to minimise the risk of a large upload failing.


Another possibility might be switching to another Serialization mechanism. JBoss Serialization is API and functionality compatible, but produces 30% less data.

0

精彩评论

暂无评论...
验证码 换一张
取 消