Does anyone have any ideas for how to pragmatically quickly check if a zip file is corrupted based on file size? Ideally the best way to开发者_如何学JAVA check if a zip is corrupted is to do a CRC check but this can take a long time especially if there is a lot of large zip files. I would be happy just to be able to do a quick file size or header check.
Thanks in advance.
Use zip -T
to test the the file corrupted or not. Sample corrupted file look like this:
zip -T filename.zip
zip warning: missing end signature--probably not a zip file (did you
zip warning: remember to use binary mode when you transferred it?)
zip warning: (if you are trying to read a damaged archive try -F)
zip error: Zip file structure invalid (filename.zip)
DotNetZip, a free open source library for handling zip files in .NET languages, supports a CheckZip() method that does what you want. There are various levels of assurance available at your option. The basic level just checks consistency of metadata. The most complete level does a full extraction of the zip file into a bitbucket to verify that the actual compressed data is not corrupted.
Section 4.3.7 of this page says that the compressed size is 4 bytes starting from byte 18. You could try reading that and comparing it to the size to the file.
However, I think it's pretty much useless for checking if the zip file is corrupted for two reasons:
- Some zip files contain more bytes than just the zip part. For example, self-extracting archives have an executable part yet they're still valid zip.
- The file can be corrupted without changing its size.
So, I suggest calculating the CRC for a guaranteed method of checking for corruption.
This might be a late answer, but if you are on the windows command line, and have 7zip installed, just add it to your system PATH and run this:
7z t file.zip
To check the whole archive 'for sure' you need to extract all data (since CRC, stored in archive, is calculated over uncompressed data), and, even after that you cannot be sure for 100% that it is not corrupted (because CRC is good, but not-guarantee that data was not altered).
精彩评论