I have two binary files (order of tens of MB) and I want to or every bit of these files. And of course, I want it to be开发者_如何转开发 as efficient as possible.
So I have two ways in mind to do that, but I still think (I kinda feel) that should be a more efficient way that I do not know of.
Given file a and b .. what I want to do is a = a|b
- Loading two files, parse them in to two huge std::bitsets and or them together
- loading two files byte by byte and or them if a huge for loop...
Is there any other way to do that?
Don't go byte-by-byte. That'd be seriously slow. Instead, read the files in chunks. Find what the block size is for your system (4k? 8K? 64k?) and read the file using chunks of that size. Then you can loop through the byte streams in memory and do the OR operations there.
In logical terms, even though you might only be reading a byte at a time, the OS will still read an entire block worth of data, then throw away all but the byte you wanted. Next time around that block'll be cached, but it's still going through the full read motions for every byte you want. So... just suck the entire block into memory and save yourself that wasted overhead.
I would recommend loading the two files a chunk at a time, where a chunk is some appropiate portion of the data. The best size would depend on your operating system and filesystem, but its usually something like the cluster size, or 2 * the cluster size, or so on... You would have to run some test to determine the best buffer size.
I don't think you would have any performance advantage either way (if in your "second option" you are going to load the file in big chunks), after all you'd be using a big stack-allocated buffer in both cases (which is what std::bitset
boils down to), so go with the one you like best.
The only advantage I see in the std::bitset::operator|=
, besides clarity, is that it may be able to exploit some platform-specific trick to or big sequences of bytes, but I think that the compiler would be able to optimize your big "or loop" anyway.
精彩评论