Question on STL internals_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-02-21 15:04 出处：网络

I am currently writing some abstractions on IO for binary data. At this point I am currently not sure on how well the STL performs on some of these tasks. For example I have a lot of stuff I can encode binary to either char * or std::vector. For now whenever I have an object of this kind of byte type I either just write it using ostream::write() or do a std::copy on the array开发者_开发百科 to a ostream_iterater on the stream. Now I was wondering, what the copy will do internally.

From what I heard, the STL is allowed to optimize anything. For example in Theory a copy of two vectors storing chars using std::copy should not copy these chars byte by byte slowly but rather use system primitives for copying chuncks of data, where available. How is this done internally.

The reason I am asking this, is because I am now trying to switch the file over to mmaped memory instead of std::ostreams. This means, that writing the char* data will be really simple, but writing vectors will be byte by byte. What would I have to provide for in my class for the STL to optimize the copying away (probably using memcpy)? I am guessing I need the right kind of iterators, but what do they need, so the STL will know it can just memcopy instead of walking them.

I know this is asking a lot of stuff I should not normally care about (principle of encapsulation is a great thing usually). And of course I know of Knuths rule of optimization, that is why I am caring about the automatic optimization facilities of the STL.

iostream is for formatted (ie. text) IO only. If you want binary IO, you have to use streambuf classes.

Also, iostreams have the reputation of being slow (for various reasons, and your mileage will vary).

Iostreams use streambuf internally, which adds a layer of indirection, and provides you with automatic buffering. If you need reasonable binary IO throughput, you may want to use streambuf derived classes directly (like fstreambuf) and benchmark it (and disable synchronization with stdio).

Or you can directly use mmap or write. Those functions are quite simple to use, and it should be easy to write your own classes around it.

Oh, and don't assume anything on what the standard library does. If you want to know more about how it does things internally, check the sources of eg. the GNU implementation.

If you aren't sure how well the STL performs, there is no substitute for testing. Time how long it takes to std::copy a chunk of data lots of times, and how long it takes to copy the same amount of data using memcopy, and compare.

Doing these tests yourself will be far more instructive than worrying about STL optimisation.

It's not really clear what you're asking. You mention vectors, std::copy, char* and memory-mapped files, but there's no obvious connection between them. Show us some code, or describe what you're trying to do, and with what kind of data types.

But a common optimization in STL implementations is to use memcpy or a similar raw memory copying mechanism as long as the object type you're copying is POD. So assuming this optimization exists in your STL implementation, all you have to do is make sure the objects you're copying are POD types.

But as previously mentioned, the only way to get reliable information about performance is to profile/measure/benchmark it yourself.