I'm needing to produce a checksum for a bunch of data in Perl, and I came across this Digest::MD5 module. It looks like it will fit the bill, but I thought I would ask here to see if anyone has any advise or possibly knows of a better module to use or even maybe a more suitable digest algorithm. What's being hashed is about 10 tables worth of data (one logical tuple at a time). This will be the first time I make use of checksums, so any tips, tricks, gotchas would be very appreciated.
Edit: As far as I know, there's nothing wrong with Digest:MD5, but I've never used it nor am I familiar with hash algorithms. I was hoping someone with experience would be able to tell me if I was on the rig开发者_JS百科ht track or not. Just wanted a little bit of confirmation before going too far.
Yes, Digest::MD5 will do the trick; it's written by Gisle Aas (author of LWP among other excellent packages) and has good reviews & ratings on cpanratings, both of which should reassure you that it's a good choice.
Using it can be as simple as:
my $checksum = Digest::MD5::md5_hex($data);
If you think you may be likely to change your chosen digest algorithm in future (for instance, using SHA-1 instead), you might want to consider using Digest instead - also written by Gisle Aas, and providing an easy interface to various Digest modules.
For example:
my $digest = Digest->new('MD5');
$digest->add($data); # to add data from a scalar, or:
$digest->add_file($filehandle); # to add data read from a filehandle
my $checksum = $digest->hexdigest; # or just ->digest for binary
That approach has the benefit that you could just change the "MD5" to e.g. "SHA-1", and you're done.
Just for completeness, I'll add why you might want to design with the ability to use other hashing algorithms easily - if this was used for any security purposes, MD5 has been shown to be vulnerable to hash collisions - the US Department of Homeland Security advises that MD5 "should be considered cryptographically broken and unsuitable for further use". However, for general checking of data integrity, it's still an acceptable choice for many, and is widely supported.
SHA-1 is also considered weak; SHA-2 is considered the best choice for secure hashing for cryptographic purposes.
精彩评论