I'm searching for a very fast binary serialization technique for c++. I only need to serialize data contained in objects (no pointers etc.). I'd like it to be as fast as possible. If it's specific to x86 hardware that's acceptable.
I'm familiar with the C methods of doing this. As a test I've bench marked a couple of techniques. I've found the C method is 40% faster than the best C++ method I implemented.
Any suggestions on how to improve the C++ method (or libraries that do this)? Anything good available for memory mapped files?
// c style writes
{
#pragma pack(1)
struct item
{
uint64_t off;
uint32_t size;
} data;
#pragma pack
clock_t start = clock();
FILE* fd = fopen( "test.c.dat", "wb" );
for ( long i = 0; i < tests; i++ )
{
data.off = i;
data.size = i & 0xFFFF;
fwrite( (char*) &data, sizeof(data), 1, fd );
}
fclose( fd );
clock_t stop = clock();
double d = ((double)(stop-start))/ CLOCKS_PER_SEC;
printf( "%8.3f seconds\n", d );
}
About 1.6 seconds for tests = 10000000
// c++ style ofstream writes
// define a DTO class
class test
{
public:
test(){}
uint64_t off;
uint32_t size;
friend std::ostream& operator<<( std::ostream& stream, const test& v );
};
// write to the stream
std::ostream& operator<<( std::ostream &stream,开发者_StackOverflow社区 const test& v )
{
stream.write( (char*)&v.off, sizeof(v.off) );
stream.write( (char*)&v.size, sizeof(v.size) );
return stream;
}
{
test data;
clock_t start = clock();
std::ofstream out;
out.open( "test.cpp.dat", std::ios::out | std::ios::trunc | std::ios::binary );
for ( long i = 0; i < tests; i++ )
{
data.off = i;
data.size = i & 0xFFFF;
out << data;
}
out.close();
clock_t stop = clock();
double d = ((double)(stop-start))/ CLOCKS_PER_SEC;
printf( "%8.3f seconds\n", d );
}
About 2.6 seconds for tests = 10000000
There are just very few real-life cases where that matters at all. You only ever serialize to make your objects compatible with some kind of external resource. Disk, network, etcetera. The code that transmits the serialized data on the resource is always orders of magnitude slower then the code needed to serialize the object. If you make the serialization code twice as fast, you've made the overall operation no more than 0.5% faster, give or take. That is worth neither the risk nor the effort.
Measure three times, cut once.
If the task to be performed is really serialization you might check out Google's Protocol Buffers. They provide fast serialization of C++ classes. The site also mentions some alternative libraries e.g. boost.serialization (only to state that protocol buffers outperform them in most cases, of course ;-)
The C++ Middleware Writer is an online alternative to serialization libraries. In some cases it is faster than the serialization library in Boost.
google flatbuffers, similar to protocol buffer but a way faster
https://google.github.io/flatbuffers/
https://google.github.io/flatbuffers/md__benchmarks.html
Well, if you want the fastest serialization possible, then you can just write your own serialization class and give it methods to serialize each of the POD types.
The less safety you bring in, the faster it'll run and the harder it'll be to debug, however there is only a fixed number of built-in, so you could enumerate them.
class Buffer
{
public:
inline Buffer& operator<<(int i); // etc...
private:
std::deque<unsigned char> mData;
};
I must admit I don't understand your problem:
- What do you actually want to do with the serialized message ?
- Are you saving it for later ?
- Do you have to worry about forward / backward compatibility ?
There might be better approaches that serialization.
Is there any way you can take advantage of things that stay the same?
I mean, you are just trying to run through "test.c.dat" as fast as you possibly can, right? Can you take advantage of the fact that the file does not change between your serialization attempts? If you are trying to serialize the same file, over and over again, you can optimize based on this. I can make the first serialization attempt take the same amount of time as yours, plus a tiny bit extra for another check, and then if you try and run the serialization again on the same input, I can make my second run go much faster than the first time.
I understand that this may just be a carefully crafted example, but you seem to be focused on making the language accomplish your task as quickly as possible, instead of asking the question of "do I need to accomplish this again?" What is the context of this approach?
I hope this is helpful.
-Brian J. Stinar-
If you're on a Unix system, mmap
on the file is the way to do what you want to do.
See http://msdn.microsoft.com/en-us/library/aa366556(VS.85).aspx for an equivalent on windows.
A lot of the performance is going to depend on memory buffers and how you fill up blocks of memory before writing to disk. And there are some tricks to making standard c++ streams a little faster, like std::ios_base::sync_with_stdio (false);
But IMHO, the world doesn't need another implementation of serialization. Here are some that other folks maintain that you might want to look into:
- Boost: Fast, assorted C++ library including serialization
- protobuf: Fast cross-platform, cross-language serialization with C++ module
- thrift: Flexible cross-platform, cross-language serialization with C++ module
Because I/O is most likely to be the bottleneck a compact format may help. Out of curiosity I tried the following Colfer scheme compiled as colf -s 16 C
.
package data
type item struct {
off uint64
size uint32
}
... with a comparable C test:
clock_t start = clock();
data_item data;
void* buf = malloc(colfer_size_max);
FILE* fd = fopen( "test.colfer.dat", "wb" );
for ( long i = 0; i < tests; i++ )
{
data.off = i;
data.size = i & 0xFFFF;
size_t n = data_item_marshal( &data, buf );
fwrite( buf, n, 1, fd );
}
fclose( fd );
clock_t stop = clock();
The results are quite disappointing on SSD despite the fact that the serial size is 40% smaller in comparison to the raw struct dumps.
colfer took 0.520 seconds
plain took 0.320 seconds
Since the generated code is pretty fast it seems unlikely you'll win anything with serialization libraries.
Both your C and your C++ code will probably be dominated (in time) by file I/O. I would recommend using memory mapped files when writing your data and leave the I/O buffering to the operating system. Boost.Interprocess could be an alternative.
To really answer this question, the reason why the C++ version is slow is that it calls the ostream.write
too many times, which induce a huge amount of unnecessary state checks. You can create a simple buffer and use only one write
and you will see the difference.
If your disk/network is really fast enough to not become the bottleneck, flatbuffers
capnproto
are great options to handle this for you.
Otherwise, protobuf
, xxx-compact
... whatever uses varint encoding can probably serialize these data to a quarter of the original size.
HPS
from the scientific computing community is also a great option for this kind of highly structured data and probably the fastest in speed and the smallest in message size in this case due to its encoding scheme.
精彩评论