Is "serialisation without duplication" possible in c++0x?_问答_开发者

One of the big uses of code generation in c++ is to support message serialisation. Typically, you want to support specifying message contents and layout in the same step and produce code for that message type that can give you objects capable of being serialised to/from communication streams. In the past, this has usually resulted in code that looks like:

class MyMessage : public SerialisableObject
{
  // message members
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;

public:
  // ctor, dtor, accesors, mutators, then:

  virtual void Serialise(SerialisationStream & stream)
  {
    stream & myNumber_;
    stream & myString_;
    stream & aBunchOfThingsIWantToSerialise_;
  }
};

The problem with using this kind of design is that violates an important rule of good architecture: you should not have to specify the intent of a design twice. Duplication of intent, like duplicated code and other common development duplication, leaves room for one place in the code to become divergent with the other, causing errors.

In the above, the duplication is the list of members. Potential errors include adding a member to the class but forgetting to add it to the serialisation list, serialising a member twice (possibly by not using the same order as the member declaration or possibly due to a misspelling of a similar member, among other ways), or serialising something that is not a member (which might produce a compiler error, unless name lookup finds something at a different scope than the object that matches lookup rules). That kind of mistake is the same reason we no longer try to match every heap allocation with a delete (instead using smart pointers) or ever file open with a close (using RAII 开发者_开发百科ctor//dtor mechanisms) - we don't want to have to match up our intent in multiple places because there are times we - or another engineer less familiar with the intent - make mistakes.

Generally, therefore, this has been one of the things that code generation could take care of. You might create a file MyMessage.cg to specify both layout and members in one step

serialisable MyMessage
{
  int myNumber_;
  std::string myString_;
  std::vector<MyOtherSerialisableObject> aBunchOfThingsIWantToSerialise_;
};

that would be run through a code generation utility and produce the code.

I was wondering if it was possible yet to do this in c++0x without external code generation. Are there any new language mechanisms that make it possible to specify a class as serialisable once, and the names and layout of it's members are used to layout the message during serialisation?

To be clear, I know that there are tricks with boost tuples and fusion that can come close to this kind of behavior even in the pre-c++0x language. Those usages, though, being based on indexing into the tuple rather than by-member-name access, have all been brittle to changing the layout, as other places in the code that access the messages would then also need to be reordered. Some kind of by-member-name access is necessary to not have to duplicate the layout specification in places in the code that use the messages.

Also, I know it might be nice to take this up to the next level and ask for specifying when some of the members shouldn't be serialised. Other languages that offer serialisation built in often offer some kind of attribute to do this, so int myNonSerialisedNumber_ [[noserialise]]; might seem natural. However, I personally think it is bad design to have serialisable objects where everything is not serialised, since the lifetime of messages is in the transport to/from the communications layer, separate from other data lifetimes. Also, you could have an object which has a purely serialisable as on of it's members, so such functionality doesn't by anything the language doesn't already offer.

Is this possible? Or did the standards committee leave out this kind of introspective capability? I don't need it to look like the code gen file above - any simple method for compiletime specification of layout and members in a single step would solve this common problem.

This is both possible and practical in C++11 – in fact it was possible back in C++03, the syntax was just a little too unwieldy. I wrote a small library based around the same idea - see the following:

www.github.com/molw5/framework

Sample syntax:

class Object : serializable <Object,
    value <NAME(“Field 1”), int>,
    value <NAME(“Field 2”), float>,
    value <NAME(“Field 3”), double>>
{
};

Most of the underlying code could be reproduced, in principal, in C++03 – some of the implementation details without variadic templates would have been...tricky, but I believe it would have been possible to recover the core functionality. What you could not reproduce in C++03 was the NAME macro above and the syntax relies fairly heavily on it. The macro provides the machinery necessary to generate a unique typename from a string, that is the following:

NAME(“Field 1”)

expands to

 type_string <'F', 'i', 'e', 'l', 'd', ' ', '1'>

through the use of some common macros and constexpr (for character extraction). Back in C++03 something similar to the type_string above would need to be entered manually.

C++, of any form, supports neither introspection nor reflection (to the extent that they are different).

One nice thing about doing serialization manually (ie: without introspection or reflection) is that you can provide object versioning. You can support older forms of the serialization, and simply create reasonable defaults for the data that wasn't in the old versions. Or if a new version removes some data, you can simply serialize and discard it.

It seems to me that what you need is Boost.Serialization.