Causes of Linux UDP packet drops_问答_开发者_运维开发者技术经验分享

I have a Linux C++ application which receives sequenced UDP packets. Because of the sequencing, I can easily determine when a packet is lost or re-ordered, i.e. when a "gap" is encountered. The system has a recovery mechanism to handle gaps, however, it is best to avoid gaps in the first place. Using a simple libpcap-based packet sniffer, I have determined that there are no gaps in the data at the hardware level. However, I am seeing a lot of gaps in my application. This suggests the kernel is dropping packets; it is confirmed by looking at the /proc/net/snmp file. When my application encounters a gap, the Udp InErrors counter increases.

At the system level, we have increased the max receive buffer:

# sysctl net.core.rmem_max
net.core.rmem_max = 33554432

At the application level, we have increased the receive buffer size:

int sockbufsize = 33554432
int ret = setsockopt(my_socket_fd, SOL_SOCKET, SO_RCVBUF,
        (char *)&sockbufsize,  (int)sizeof(sockbufsize));
// check return code
sockbufsize = 0;
ret = getsockopt(my_socket_fd, SOL_SOCKET, SO_RCVBUF, 
        (char*)&sockbufsize, &size);
// print sockbufsize

After the call to getsockopt(), the printed value is always 2x what it is set to (67108864 in the example above), but I believe that is to be expected.

I know that failure to consume data quickly enough can result in packet loss. However, all this application does is check the sequencing, then push the data into a queue; the actual processing is done in another thread. Furthermore, the machine is modern (dual Xeon X5560, 8 GB RAM) and very lightly loaded. We have literally dozens of identical applications receiving data at a much higher rate that do not experience this problem.

开发者_StackOverflow社区

Besides a too-slow consuming application, are there other reasons why the Linux kernel might drop UDP packets?

FWIW, this is on CentOS 4, with kernel 2.6.9-89.0.25.ELlargesmp.

If you have more threads than cores and equal thread priority between them it is likely that the receiving thread is starved for time to flush the incoming buffer. Consider running that thread at a higher priority level than the others.

Similarly, although often less productive is to bind the thread for receiving to one core so that you do not suffer overheads of switching between cores and associated cache flushes.

I had a similar problem with my program. Its task is to receive udp packets in one thread and, using a blocking queue, write them to the database with another thread.

I noticed (using vmstat 1) that when the system was experiencing heavy I/O wait operations (reads) my application didn't receive packets, they were being received by the system though.

The problem was - when heavy I/O wait occured, the thread that was writing to the database was being I/O starved while holding the queue mutex. This way the udp buffer was being overflown by incoming packets, because main thread that was receiving them was hanging on the pthred_mutex_lock().

I resolved it by playing with ioniceness (ionice command) of my process and the database process. Changing I/O sched class to Best Effort helped. Surprisingly I'm not able to reproduce this problem now even with default I/O niceness. My kernel is 2.6.32-71.el6.x86_64.

I'm still developing this app so I'll try to update my post once I know more.

int ret = setsockopt(my_socket_fd, SOL_SOCKET, SO_RCVBUF, (char *)&sockbufsize, (int)sizeof(sockbufsize));

First of all, setsockopt takes (int, int, int, void *, socklen_t), so there are no casts required.

Using a simple libpcap-based packet sniffer, I have determined that there are no gaps in the data at the hardware level. However, I am seeing a lot of gaps in my application. This suggests the kernel is dropping packets;

It suggests that your environment is not fast enough. Packet capturing is known to be processing intensive, and you will observe that the global rate of transmissions on an interface will drop as you start capturing programs such as iptraf-ng or tcpdump on one.

I don't enough reputation to comment, but similar to @racic, I had a program where I had one receive thread, and one processing thread with a blocking queue between them. I noticed the same issue with dropping packets because the receiving thread was waiting for a lock on the blocking queue.

To resolve this I added a smaller local buffer to the receiving thread, and had it only push data into the buffer then it wasn't locked (using std::mutex::try_lock).