For a build script, I need to work with source packages of a certain version. In order to not having to include big source archives, the scripts just stores their checksums (SHA1) and downloads them automatically. This works very well for official releases such as
http://download.videolan.org/pub/videolan/libdca/0.0.5/libdca-0.0.5.tar.bz2
However, some packages don't provide an official release, so I download a well-tested version from the version control system. For instance, Gitweb provides the handy "snapshot" feature for downloading a TarGZ archive:
http://git.videolan.org/?p=libbluray.git;a=snapshot;h=cf9ee593f;sf=tgz
Unfortunat开发者_如何学JAVAely, this URL returns a slightly different file on each request. Although it always returns exactly the same tar archive which is always compressed via gzip in the same way, there is a small difference in the timestamp near the beginning of the gzip archive.
Those few bytes make the checksum differ on each download, so the script can't ensure the integrity of the downloaded source archive anymore.
How can I circumvent this issue?
Just zcat $archive |sha1sum
it if the tar is stable. Otherwise, you could check out the correct sha1 using git (maybe with --depth 0), or store pristine-tar deltas that let you rebuild a stable archive.
The zcat solution is appropriate, but if for any reason you are worried that zcat eats CPU for nothing, you can just jump over the 10 byte header at the start of the gzip archive that contains the timestamp (see http://www.gzip.org/zlib/rfc-gzip.html#file-format), and hash the rest.
So tail --byte +10 $archive | sha1sum
can be nice instead
Also tail --byte +10 $archive | openssl sha1
could be useful in an environement where you don't have sha1sum
精彩评论