I'm writing a file system deduper. The first pass generates md5 checksums, and the second pass compares the files with identical checksums.
Is there a collection of strings which differ but generate identical md5 checksums I can incorporate into my test case collection?
Update: mjv's answer 开发者_Go百科points to these two files, perfect for my test case.
- http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate1.cer
- http://www.win.tue.nl/~bdeweger/CollidingCertificates/MD5Collision.certificate2.cer
You can find a couple of different X.509 certificate files with the same MD5 hash at this url.
I do not know of MD5 duplicate files repositories, but you can probably create your own, using the executables and/or the techniques described on Vlastimil Klima's page on MD5 Collision
Indeed MD5 has been know for its weakness with regards to collision resistance, however I wouldn't disqualify it for a project such as your file system de-duper; you may just want to add a couple of additional criteria (which can be very cheap, computationally speaking) to further decrease the possibility of duplicates.
Alternatively, for test purposes, you may simply modify your MD5 compare logic so that it deems some MD5 values identical even though they are not (say if the least significant byte of the MD5 matches, or systematically, every 20 comparisons, or at random ...). This may be less painful than having to manufacture effective MD5 "twins".
http://www.nsrl.nist.gov/ might be what you want.