Every once in a while, downloading (especially large) files through ftp will produce errors. I am guessing that's also partly the reason why all major sites are publishing external checksums along with their downloads.
How is this possible if ftp goes through TCP, which has checksum inbuilt and resends data if it is transmitted corruptly?
One could argue that this is due to the short length of the CRC in the TCP protocol (which is 16bit I think, or 开发者_如何学Pythonsomething like that), and the collisions are simply happening too often. but 1) for this to be true, not only must there be a CRC collision, but also the random network error must modify both the CRC in the packet, and the packet itself so that the CRC will be valid for the new packet... Even with 16 bitCRC, is that so likely? 2) There are seemingly not many errors in, say, browsing the web which also goes through TCPIP.
FTP distinguishes between ASCII and BINARY data, and can modify the data stream accordingly, which is the most common reason I've encountered for corrupted FTP downloads. (The TCP checksums would be computed on the modified data, so nothing would appear amiss at the TCP level.)
Next most common, I suppose, would be a transfer that gets truncated due to a timeout or other network error. In that case the TCP checksums would be locally correct, but the partially downloaded file is corrupt.
The FTP protocol is a bit firewall-unfriendly, since it can involve external hosts connecting back on unpredictable port numbers, but that usually manifests as an inability to transfer anything at all, rather than a corrupted download.
Apart from ASCII vs. BINARY issues, I can't think of a reason why FTP connections should be more susceptible to corrupted transfers. Maybe you just notice them more, because they tend to be things like binaries or compressed files that need to be bit-for-bit complete and correct, and if not you get a big ugly error message. One is much less likely to notice, say, a missing advertisement on a web page because the connection to the ad network timed out.
A 16-bit checksum isn't startlingly strong, especially when you consider the size of some FTP transfers, e.g. software downloads. However there are CRCs and so forth at the lower layers which compensates.
I don't think I've had a corrupt FTP download this century myself.
精彩评论