MD5 checksum is widely used for integrity checking for Http downloading big files. My question is, since TCP 开发者_JAVA技巧itself provides reliable mechanism (i.e. checksum for each TCP package to ensure its integrity). So, in short TCP is reliable. Http is based on TCP (so Http should also be reliable), so why we need another mechanism of integrity checking (i.e. MD5 checksum)?
thanks in advance, George
Most often you use the hash sum for an out of band (printed on the webiste for example) check of the download integrity, not programmatic.
This prevents manipulation of the download artifact.
More than 3 times in my life I downloaded a broken ISO or EXE and when I downloaded it again it worked. This proves to me that the TCP mechanism isn't enough to ensure integrity.
Answer is simple. The source file may already be corrupt before you even begin downloading. TCP only verifies that the file you download is the same as the source. MD5 guarantees that you could know if it's corrupt whether the cause be a problem in transfer or the initial file itself.
When it comes to the 35G of TED-LIUM corpus or the even larger 400G of tiny-images, it seems almost something error every time in the downloaded file. For the 35G TED-LIUM corpus, I did the download for at least 20 times and totally 700G of the network transmission for several months. CRC is just a nightmare.
精彩评论