I wonder if it is 'safe' to hash a bunch of MD5 hash values together to create a new hash or whether this will in any way increase the probability of collisions.
The background: I have a couple of files with dependencies开发者_如何学Python. Each file has an associated hash value which is calculated based on it's content. Let's call this the 'single-file' hash value. In addition to this, the file should also have a hash value which includes all the dependent files, the 'multi-file' hash value.
So the question is: Can I just take all the single-file MD5 hash values of the dependent files, concatenate them and then calculate an MD5 over the concatenated values to get the multi-file hash value. Or will this result in an MD5 hash that is more likely to collide than if I would concatenate the content of all dependent files together.
Alternatively, could I xor the single-file hash values together to generate a multi-file hash value, or would this likely result in more collisions?
Sounds like you need a Merkel Tree
MD5 has a lot of collision problems, see MD5 entry on Wikipedia.
However, if you use MD5 not for security but as a unique marker to check dependencies, even hashing contatenated hashes should be pretty safe.
Or, if it's not too late, switch to SHA-1.
I think the risks of a collision is about the same for hashing the concatenated files, as to hashing the concatenated file hashes.
精彩评论