开发者

Compare files byte by byte or read all bytes?

开发者 https://www.devze.com 2023-01-30 15:47 出处:网络
I came across this code http://sup开发者_StackOverflowport.microsoft.com/kb/320348 which made me wonder what would be the best way to compare 2 files in order to figure out if they differ.

I came across this code http://sup开发者_StackOverflowport.microsoft.com/kb/320348 which made me wonder what would be the best way to compare 2 files in order to figure out if they differ.

The main idea is to optimize my program which needs to verify if any file is equal or not to create a list of changed files and/or files to delete / create.

Currently I am comparing the size of the files if they match i will go into a md5 checksum of the 2 files, but after looking at that code linked at the begin of this question it made me wonder if it is really worth to use it over creating a checksum of the 2 files (which is basically after you get all the bytes) ?

Also what other verifications should I make to reduce the work in check each file ?


Read both files into a small buffer (4K or 8K) which is optimised for reading and then compare buffers in memory (byte by byte) which is optimised for comparing.

This will give you optimum performance for all cases (where difference is at the start, middle or the end).

Of course first step is to check if file length differs and if that's the case, files are indeed different..


If you haven't already computed hashes of the files, then you might as well do a proper comparison (instead of looking at hashes), because if the files are the same it's the same amount of work, but if they're different you can stop much earlier.

Of course, comparing a byte at a time is probably a bit wasteful - probably a good idea to read whole blocks at a time and compare them.

0

精彩评论

暂无评论...
验证码 换一张
取 消