How can I send an std::vector<std::string> over a UNIX socket?_问答_开发者

For my application, I need to be able to send an std::vector<std::string> over a UNIX socket(local), and get a copy of the vector on the other 开发者_C百科end of the socket. What's the easiest way to do this with O(1) messages relative to the size of the vector(i.e. without sending a message for each string in the vector)?

Since this is all on the same host, and because I control both ends of the socket, I'm not concerned with machine-specific issues such as endinness or vector/string representation.

I'd like to avoid using any external libraries for a variety of reasons.

std::string does not prevent you from having nuls inside your string. It's only when you try to use these with nul sensitive APIs that you run into trouble. I suspect you would have serialize the array by prepending the size of the array and then the the length of each string on the wire.

...
long length = htonl( vec.size() );
write( socket, &length, sizeof(length) );
for ( int i = 0; i < vec.size(); ++i ) {
    length = htonl( vec[i].length() );
    write( socket, &length, sizeof(length) );
    write( socket, vec[i].data(), vec[i].length() );
}
...

Unpacking is done similarly:

...
std::vector vectorRead;
long size = 0;
read( socket, &size, sizeof( size ) );
size = ntohl( size );
for ( int i = 0; i < size; ++i ) {
    std::string stringRead;
    long length = 0;
    read( socket, &length, sizeof( length ) );
    length = ntohl( length );
    while ( 0 < length ) {
        char buffer[1024];
        int cread;
        cread = read( socket, buffer, min( sizeof( buffer ), length ) );
        stringRead.append( buffer, cread );
        length -= cread;
    }
    vectorRead.push_back( stringRead );
}
...

Packing data structures for transmission and reception is usually called serialization.

One option you could use: The Boost serialization library has a capability of serializing STL vectors.

Another would be to roll your own - shouldn't be difficult in this case. You could, for example, concatenate all the strings of the vector together into a single string (with each constituent NULL separated) and send that buffer, then restoring it similarly.

I'm sure I will get yelled at by C++ zealots for this, but try writev(2) (a.k.a. scatter/gather I/O). You would have to deal with zero separators on the receiving side anyway though.

The solution I ended up taking was serializing the vector of strings in the form <string1>\0<string2>\0...<stringN>\0 (sending the length of the aforementioned string beforehand). While David correctly points out that this will not work for cases where std::string contains a null, I can guarantee this will not be the case for my application.

There is no way to send vector via a socket, even on the same machine (or even in the same process for that matter). There are two issues with this:

vector and string both maintain internal pointers to raw memory. This precludes sending the vector<,string> to another process
The dtors of the vector and string will want to delete that pointer. socket operations will do a memcpy of of your object (including the values of the raw pointers) and you will get a double deletion.

So the rule is this : in order to send an objects via a socket it must be able to be memcpy'd. There are several ways to do this

Serialize the vector Things like ICE are good at generating these serializations http://www.zeroc.com/ These have the obvious overhead
Create something with the same interface as vector and string, but is capable of being memcpy'd
Create read-only versions of something that looks like vector The send side can be regular vector the recv side can reinterpret_cast the recv buffer as the read only implementation

Number 2 is very difficult to do in general, but with certain limitations is possible. For high performance apps, you arent going to be using vector in any case.

Number 3 applies to vritually all the use cases out there, in that reader rarely modifies the contents of the recv buffer. If the reader does not need random access iterators, and can live with ForwardIterators, the serialization is pretty easy: alloc one buffer that can hold all the strings, plus and integer for each denoting the length plus one int for the size of the vector.

The result can be reinterpret_cast'd to a user defined structure that is a read only collection of read only strings. So without too much trouble you can at least get O(1) on the read side.

To get O(1) on the send side, you would have to go with method 2. I've done this, knowing that my app will never use more than strings of X length, and that the vector will never hold more than Y items. The trick is that fixing the capacity I'll never have to go to the heap for memory. The downside is that you are sending the entire capacity of each string, and not just what was used. However in many cases just sending everything is far faster that trying to compact it, esp if you are on the same machine -- in this case you could just place this structure in shared memory and notify the recv app to just look for it.

You may want to look at boost interprocess for more ideas on how to make containers that can be shoved through sockets without serialization.