Are there any important change in how SLES 10 implements Tcp sockets 开发者_如何学Cvs. SLES 9?
I have several apps written in C# (.NET 3.5) that run on Windows XP and Windows Server 2003. They've been running fine for over a year, getting market data from a SLES 9 machine using a socket connection.
The machine was upgraded today to SLES 10 and its causing some strange behavior. The socket normally returns a few hundred or thousand bytes every second. But occasionally, I stop receiving data. Ten or more seconds will go by with no data and then Receive returns with a 10k+ bytes. And some buffer is causing data loss because the bytes I receive on the socket no longer make a correct packet.
The only thing changed was the SLES 9 to 10 upgrade. And rolling back fixes this immediately. Any ideas?
The dropped packets can be resolved by upgrading the smb kernel to 2.6.16.60-0.37 or later. The BNX2 kernel module is the root cause for the dropping packets. This is a known issue with SLES 10 out of the box.
Reference: http://www.novell.com/support/search.do?cmd=displayKC&sliceId=SAL_Public&externalId=7002506
The defaults for /proc/sys/net settings may have changed. Maybe newer SLES enables things like tcp_ecn?
If your network is dropping some packets it doesn't like with SLES10, then it's probably enabling newer TCP features. Otherwise I don't know. I'd look at it with tcpdump/wireshark. And maybe strace the server process to see what system calls it was doing.
SLES is the sender, so it's possible something could have changed that made it decide to wait until it had a full window of data or something. But 10k is too much. Sounds more like dropped packets, and then a large return when a missing packet finally arrives, allowing the queued up data to be returned too.
精彩评论