I have an application that downloads a file from the server. The connection is very unstable and so we are implementing a feature to check for file integrity so that we can know if the file was not downloaded correctly and manage accordingly.
How should I go about this process? Ri开发者_StackOverflow社区ght now I make a request to the server for the file's hash, then I make another request for the file itself, then compute the hash for the downloaded file and file compare the 2 hashes.
Is this the right approach? Something tells me it is not. If the hashes are found to be different I go through the exact same process a few times including requesting the hash again (which should be the same). Should I bother requesting the hash every time ? I'm doing it in case that is not transferred correctly ? Is this unnecessary ? Would there be a way for me to reduce the number of requests since they are expensive and things are veeery slow right now.
Any ideas?
Just in case it matters the server is using C# and the client is an android device (JAVA).
Thanks,
TCP/IP does integrity checking on its own; you don't have to. Integrity of each data packet is ensured with CRC, and the TCP protocol checks for lost packets and requests resubmission. So as long as your server generates the Content-Length header, you can be sure that mistransmission is detected and the client errors out.
That said, a good place for a file hash would be a custom HTTP header. Prefix its name with "X-", so that it does not collide with existing or future standard headers.
Yes there is a better way. Firstly, instead of requesting a hash of the entire file, compress the file and segment the compressed data into (say) 100KB blocks and supply a sequence of hashes, one per block, followed by a self-hash of those sequence of hashes. By a self-hash I just mean taking the vector of hashes, hashing that and sticking that on the end of the vector.
You can now verify that this vector of hashes transferred correctly by checking the self-hash. If it doesn't pass, re-request the hash vector.
The second phase is then to request the transfer of the compressed data. As this comes across, you can check at 100KB intervals that the transfer is correct, aborting as soon as you get an error. Then (if possible) start the re-request from where you left off, a "high tide mark".
Finally you can safely decompress the data. Many decompression algorithm will perform a further integrity check, which gives you a further round of verification - defending against any programming mistakes. A free check is worth it.
This approach will work regardless of whether or not you're working over a checked protocol like TCP/IP or an unreliable protocol like UDP. Compressing the data, if you don't do it already, will be a significant improvement too.
The only downside - it is obviously a lot more work.
精彩评论