开发者

Is it possible to reuse a binary_oarchive instance?

开发者 https://www.devze.com 2023-03-12 18:32 出处:网络
My question is the same as discussed in this thread from five years ago (which has no good answer). I\'m serializing my objects into a byte buffer, like so:

My question is the same as discussed in this thread from five years ago (which has no good answer).

I'm serializing my objects into a byte buffer, like so:

std::string serial_str;
for (i = 1; i < 10000; i++)
{
    boost::iostreams::back_insert_device<std::string> inserter(serial_str);
    boost::iostreams::stream<boost::iostreams::back_insert_device<std::string> > s(inserter);
    boost::archive::binary_oarchive oa(s);

    oa << obj;

    s.flush();

    // code to send serial_str's content to another process, omitted.

    serial_str.clear(); // clear the buffer so it can be reused to serial开发者_运维百科ize the next object
}    

When I do this in a loop, the performance is quite bad: I get ~14,000 objects / sec.

I've pinpointed the problem down to the recreation of the binary_oarchive. If I just write into the same string with the same archive instance in a loop, I get ~220,000 objects/sec, but then, the objects are serialized one after the other sequentially, which isn't what I want: I want to clear and reuse the same buffer (seek to its beginning) after each object is serialized.

How can I do that?


Yes, you absolutely can reuse it, in a sense. The oarchive simply wraps up a stream and doesn't know what's going on with the stream's data, so the trick is to implement your own stream (which isn't fun) to allow you to "reset" the actual underlaying data stream. I've written something like this before and it works wonderfully.

Some gotchas to be aware of though:

The oarchive won't keep writing out header information (since if it persists it's treating everything as one big stream), so you'll want to disable the headers:

boost::archive::binary_oarchive oa(s, boost::archive::no_codecvt | boost::archive::no_header);

Also, because you're reusing an oarchive, you have to be extremely careful about managing its internal type table. If all you're serializing are ints, floats, etc, then you'll be fine, but as soon as you start serializing classes, strings, and the like you can't rely on the default type enumeration that the archive uses when reusing the archive like this. The Boost documentation doesn't really get into this, but for anything complex, you need to do the following for every type the archive will come across:

oa.template register_type<std::string>();
oa.template register_type<MyClass>();
oa.template register_type<std::shared_ptr<MyClass> >();

And so on.. for all your types, all std::vectors of them, all std::shared_ptrs of them, etc. This is vital. Otherwise you'll only be able to read back your streams if you use a shared iarchive and read them in the exact same order they were serialized out.

The consequence is that your iarchive needs to register all the types in the exact same way and order as their oarchive (I wrote some handy helpers using mpl to do help me with this).

Serializing back in through an iarchive can also share the same iarchive, however all the same conditions apply:

  • You need to write your own stream (so it can be redirected/reset)
  • Disable the archive headers
  • Have the register types

So yes, reusing an oarchive/iarchive is possible, but it's a bit of a pain. Once you've got it sorted out though, it's pretty awesome.


Here is the solution I came up with. It does not require implementation of your own stream and allows to reuse the same chunk of memory for each next serialization. Supposed that you have following structures arranged for serialization:

boost::iostreams::basic_array<char> sink; // target buffer 
boost::iostreams::stream<boost::iostreams::basic_array<char> > os;  // stream wrapper around it
boost::archive::binary_oarchive oa;  // archive which uses this stream

Then to reuse the same buffer just reopen the stream:

os.close();
os.open(sink);

Should be as fast as changing some internal pointers inside the stream. I have not tested the actual speed, although.

Code for trying this out: Writer serializes passed pointer to the buffer. Reader deserializes pointer from the same buffer (same buffer is shared between reader and writer)

#include <iostream>
#include <fstream>
#include <boost/archive/binary_oarchive.hpp>
#include <boost/archive/binary_iarchive.hpp>
#include <boost/iostreams/device/array.hpp>
#include <boost/iostreams/stream.hpp>
#include <boost/serialization/export.hpp>
#include <boost/serialization/access.hpp>

class A;
class Writer {
    char *buf;
    int len;
    boost::iostreams::basic_array<char> sink;
    boost::iostreams::stream<boost::iostreams::basic_array<char> > os;
    boost::archive::binary_oarchive oa;
public:
    Writer(char *_buf, int _len): buf(_buf), len(_len), sink(buf, len), os(sink), oa(os) {}
    void write(A* a) {
        oa << a;
    }
    void reset() {
        os.close();
        os.open(sink);
    }
};
class Reader {
    char *buf;
    int len;
    boost::iostreams::basic_array_source<char> src;
    boost::iostreams::stream<boost::iostreams::basic_array_source<char> > is;
    boost::archive::binary_iarchive ia;
public:
    Reader(char *_buf, int _len): buf(_buf), len(_len), src(buf, len), is(src), ia(is) {}
    A* read() {
        A* a;
        ia >> a;
        return a;
    }
    void reset() {
        is.close();
        is.open(src);
    }
};

int main(int argc, char **argv) {
    // to memory
    char buffer[4096] = {0};

    Writer w(buffer, sizeof(buffer));
    A *a1 = new A(5);
    w.write(a1);

    Reader r(buffer, sizeof(buffer));
    A *a2 (NULL);
    a2 = r.read();

    assert(*a1 == *a2);
    std::cout << "Simple ok\n";

    // test reuse
    w.reset();
    r.reset();

    A *a3 (NULL);
    w.write(new A(10));
    a3 = r.read();

    assert(*a3 == A(10));
    std::cout << "Reuse ok\n";
};

class A
{
private:
  friend class boost::serialization::access;
  int i;

  template <typename Archive>
  void serialize(Archive& ar, const unsigned int version) {
    std::cout << "serialize A\n";
    ar & i;
  }
public:
  A(): i(0) {};
  A(int _i): i(_i) {};
  virtual bool operator==(const A&r) { return i == r.i; };

  virtual ~A() {};
  virtual void whoa() {std::cout << "I am A!\n";};
  virtual const char* me() { return "A"; };
};


One solution, without having to look much further would be to store the last length of the string, and get the substring using the last length and actual length (will be the last string added to the output). Each 10 or 100 iterations you can restart the binary_oarchive not to accumulate much past encoded objects in serial_str.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号