开发者

Recursive MD5 and probability of collision

开发者 https://www.devze.com 2023-04-05 21:13 出处:网络
I wonder if it is \'safe\' to hash a bunch of MD5 hash values together to create a new hash or whether this will in any way increase the probability of collisions.

I wonder if it is 'safe' to hash a bunch of MD5 hash values together to create a new hash or whether this will in any way increase the probability of collisions.

The background: I have a couple of files with dependencies开发者_如何学Python. Each file has an associated hash value which is calculated based on it's content. Let's call this the 'single-file' hash value. In addition to this, the file should also have a hash value which includes all the dependent files, the 'multi-file' hash value.

So the question is: Can I just take all the single-file MD5 hash values of the dependent files, concatenate them and then calculate an MD5 over the concatenated values to get the multi-file hash value. Or will this result in an MD5 hash that is more likely to collide than if I would concatenate the content of all dependent files together.

Alternatively, could I xor the single-file hash values together to generate a multi-file hash value, or would this likely result in more collisions?


Sounds like you need a Merkel Tree


MD5 has a lot of collision problems, see MD5 entry on Wikipedia.

However, if you use MD5 not for security but as a unique marker to check dependencies, even hashing contatenated hashes should be pretty safe.

Or, if it's not too late, switch to SHA-1.


I think the risks of a collision is about the same for hashing the concatenated files, as to hashing the concatenated file hashes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号