How should I mark the end of a TCP packet?_问答_开发者

In a client/server application were text data of varying length will be sent back and forth between the client and server, how should I mark the end of a packet that is being sent? For example, when the server is receiving packet data from a client how does the server开发者_如何学Go know that the client packet has fully been received?

Is it more common to tell the server the full length of the packet that it is going to receive before the data or to have something marking the end of the packet?

Some of the data sent will only be a few characters long and some could be thousands of characters.

TCP provides a continuous stream of data. TCP is implemented using packets but the whole point of TCP is to hide them.

Think of it as if it was a wall on which you want to draw. The wall is made of bricks. Bricks are glued together with mortar, and plaster is applied to that the wall surface become smooth. Bricks are the IP packets, TCP is the plaster.

So now you have your smooth plastered TCP tunnel, and you want to add some structure in it. You want to draw boxes, so that your drawings are kept separate from each other. This is what you want to do: to add a bit of "administrative" structure (boxes around the drawings) to your data.

Many protocols use the concept of a packet, which is a bunch of data beginning with a fixed-format administrative header. The header contains enough information to decide where the packet ends; e.g., it includes the packet length. HTTP does that, with a Content-Length header, or (with HTTP/1.1) with the "chunked transfer encoding" where data is split into one or several mini-packets, each with a simple header consisting of exactly a mini-packet-length indication.

Another way is to have a special terminator sequence which cannot appear in "normal data". If your data is text, then you could use a byte of value zero as terminator.

Yet another way is to use self-terminated data. This is data structured in such a way that you can know at any point whether the end of the element has been reached. For instance, XML data is organized as nested pairs of markers such as <foo>...</foo>. When the end marker (</foo>) is reached, you know that the element is finished.

Take your cues from HTTP.

Use a terminator sequence of characters, or specify a length somewhere in the message header, or use a clever combination of both.

Like HTTP does: the headers end with CR-LF-CR-LF. If there is data past the headers, the data length is in one of the headers.

Beware of garbage if you encode the length at the beginning. For instance if you use 4 binary bytes for length and some external probe sends an HTTP request, you will likely end up with a huge number and waiting forever (not to mention allocating a buffer which could crash your program). I send the length twice each one through a different function and compare them (e.g. ~len and len xor 0x139AF321). You should also set a maximum in case someone is actively trying to crash your program. If I get a bad length I just close the connection.

This is over and above an HMAC if your traffic is encrypted.

Structure your packet in such a way that it includes a length field at the beginning.

If the sender knows the length, then the sender should supply the length up front as fixed size field, followed by the variable size data.

The advantage vs. a tail marker is that the receiver can optimize for expected amount of data, e.g. allocate a buffer of the correct size. For example, storage over TCP/IP protocols have the same problem over TCP/IP as you. In those cases, headers provide the length of subsequently expected data.

Later down the road, you may find other bits to put in your "header". You'll be glad you have some structure in place to grow your own layer-5 protocol.

If you're feeling particularly bold, you can look into using SCTP sockets instead of TCP sockets.