Proper way to create packets along with sending and receiving them in socket programming using C_问答_开发者

I had 开发者_如何学运维written a small client-server code where in I was sending integers and characters from my client to server. So I know the basics of socket programming in C like the steps to follow and all. Now I want to create a packet and send it to my server. I thought that I will create a structure

    struct packet  
    { 
    int srcID;
    long int data;
    .....
    .....
    .....
    };
    struct packet * pkt;

Before doing send(), I thought that I will write values inside the packet using

   pkt-> srcID = 01
   pkt-> data = 1 2 3 4

I need to know whether I am on the right path, and if yes then can I send using

      send(sockfd, &packet, sizeof(packet), 0)

for receiving

    recv(newsockfd, &PACKET, sizeof(PACKET), 0)

I have just started with network programming, so I am not sure whether i am on the right path or not. It would be of great help if anyone can guide me with my question in any form (theoretical,examples etc). Thanks in advance.

The pointer pkt is NOT defined in your application. You have two options: 1) Declare pkt as a normal variable

 struct packet pkt;

 pkt.srcID = 01;
 ....
 send(sockfd, &pkt, sizeof(struct packet), 0);

2) The second approach is useful when your packet contains a header followed by a payload:

 char buffer[MAX_PACKET_SIZE];
 struct packet *pkt = (struct packet *) buffer;
 char *payload = buffer + sizeof(struct packet);
 int packet_size;  /* should be computed as header size + payload size */

 pkt->srcID = 01;
 ...
 packet_size = sizeof(struct packet) /* + payload size */ ;

 send(sockfd, pkt, packet_size, 0);
 ....

UPDATED (to answer your comment): First, you should know that receiving from a TCP socket MAY NOT provide the whole packet. You need to implement loop (as suggested by Nemo) to read the whole packet. Since you prefer the second option, then you need two loops. The first loop is to read the packet header to extract the payload size and the second loop to read the data. In case of UDP, you don't need to worry about partial receiving. Here is a sample code (without looping) where sockfd is a UDP socket:

char buffer[MAX_PACKET_SIZE];
struct packet *pkt = (struct packet *) buffer;
char *payload = buffer + sizeof(struct packet);
int packet_size;  /* should be computed as header size + payload size */

.....
/* read the whole packet */
if (recv(sockfd, pkt, MAX_PACKET_SIZE, 0) < 0) {
    /* error in receiving the packet. It is up to you how to handle it */
}
/* Now, you can extract srcID as  pkt->srcID */
/* you can get data by processing payload variable */

Remember: * you need to implement serialization as mentioned by other users * UDP is unreliable transport protocol while TCP is a reliable transport protocol.

Packet Structure

When you are going to send the data across the network then you need to consider having a fixed size header followed by a variable length payload.

[1 byte header] [Variable byte Payload]

The header should give you the size of data you are going to send so that the receiver will always read a fixed size header and then determine the packet length and read the rest of the bytes.

eg:

int nRet = recv(nSock,(char*)pBuffer,MESSAGE_HEADER_LENGTH,0);  
if (nRet ==MESSAGE_HEADER_LENGTH)               
{
   int nSizeOfPayload = //Get the length from pBuffer;
   char* pData = new Char(nSizeOfPayload );
   int nPayloadLen = recv(nSock, (char*)pData,nSizeOfPayload ,0); 
}

Variable length data in payload

If your structure is having string you should always have the size of the string appended before the string.

Endianess

If you are sending the packet to two different applications running in different machines you need to agree before hand on how you are representing your bytes i.e whether you are going to send MSB first or LSB first.

Directly writing structs out to the network is seductively clean, simple, neat... and unfortunately, wrong. (This doesn't stop a lot of people, including many who should know better, from doing it anyway).

There's several reasons for this¹:

Data representation. This includes the sizes of various types, concerns like endianness, and floating point formats. You can't assume that these are the same on the other end of the connection as they are on your end; they vary a lot by architecture.
Structure layout. Compilers typically add invisible padding between members of structs when they have uneven size - and the layout of this padding also varies by architecture. Bitfields are another structure layout variable.
Canonicalisation. In particular, pointer members are meaningless to send across the socket - they don't mean anything to the other side. You have to send what it points to instead. Another aspect of this is sending an entire array when only some of it is filled with meaningful values - the uninitialised members might leak memory contents that you don't want them to (and sending them is wasteful, anyway).
Partial reads. You end up needing to use a char * pointer when reading anyway, because you might have to resume reading partway through a struct (eg. your struct might be 830 bytes long, but the first read returned only 500 bytes).

The right way to go about this is to use a process called serialisation: for each data structure you want to send across the network, you have a function which canonicalises the contents and packs them into a char buffer in a defined format. This includes converting integers to a specific endianness (functions like htonl() are useful here). There's also a corresponding function that unpacks a char buffer into the struct form, used on the recieving side.

There are several existing libraries of code for serialisation - for example, Google's Protocol Buffers.

The other workable alternative is to serialise your data structures into a textual format, like JSON. This is very good for troubleshooting, because it means that your network protocol is somewhat human-readable.

^{1. Anticipating some objections: Yes, there are workarounds for most of those issues. But that's just what they are: workarounds, that often rely on compiler-specific features, cancel out most of the simplicity benefits, and still aren't completely reliable anyway.}