I have a problem with a piece of legacy c++/winsock code that is part of a multi-threaded socket server. The application creates a thread that handles connections from clients, of which there are typically a couple of hundred connected at any one time. It typically runs without a problem for several days (continuously), and then suddenly stops accepting connections. This only happens in production, never test.
It uses WSAEventSelect() to detect FD_ACCEPT network events. The (simplified) code for the connection handler is:
SOCKET listener;
HANDLE hStopEvent;
// ... initialise listener and hStopEvent, and other stuff ...
HANDLE hAcceptEvent = WSACreateEvent();
WSAEventSelect(listener, hAcceptEvent, FD_ACCEPT);
HANDLE rghEvents[] = { hStopEvent, hAcceptEvent };
bool bExit = false;
while(!bExit)
{
DWORD nEvent = WaitForMultipleObjects(2, rghEvents, FALSE, INFINITE);
switch(nEvent)
{
case WAIT_OBJECT_0:
bExit = true;
break;
case WAIT_OBJECT_1:
HandleConnect();
WSAResetEvent(hAcceptEvent);
break;
case WAI开发者_StackOverflowT_ABANDONED_0:
case WAIT_ABANDONED_0 + 1:
case WAIT_FAILED:
LogError();
break;
}
}
From detailed logging I know that, when the problem occurs, the thread enters WaitForMultipleObjects() and never emerges, even though there are clients attempting to connect and waiting for an accept. The WAIT_FAILED and WAIT_ABANDONED_x conditions never occur.
While I haven't ruled-out a config problem on the server, or even some kind of resource leak (can't find anything), I am also wondering if the event created by WSACreateEvent() is somehow being 'dissassociated' from the FD_ACCEPT network event - causing it to never fire.
So, am I doing something wrong here? Is there something I should be doing that I'm not? Or a better way? I'd appreciate any suggestions! Thanks.
EDIT
The socket is a non-blocking socket.
EDIT
Problem solved by using the approach suggested by kipkennedy (below). Changed hAcceptEvent to be an auto-reset event, and removed the call to WSAResetEvent() which was no-longer needed.
Maybe an FD_ACCEPT is signaling during HandleConnect() after the accept() and before the return and subsequent ResetEvent(). Then, ResetEvent() ends up resetting all signals and no re-enabling accept() is ever called. For example, the following sequence is possible:
- Event signaled, WaitForMultipleObjects() returns
- During HandleConnect(), sometime after accept() is called, the Event is signaled again
- HandleConnect() returns
- ResetEvent() resets the event, masking the second signal
- WaitForMultipleObjects() never returns since as far as Windows is concerned, it has already signaled the subsequent event and no subsequent accepts() have re-enabled it
A couple possible solutions: 1) loop on accept() in HandleConnect() until WSAEWOULDBLOCK is returned 2) use an auto-reset event or immediately reset the event before calling HandleConnect()
Code looks fine. The only thing I can suggest is calling WSAWaitForMultipleObjects() instead of the global version.
From reading the docs, it appears that WSAEventSelect()
is as parsimonious about notifications as WSAAsyncSelect()
. The stack doesn't signal FD_ACCEPT
every time a connection comes in. To Winsock, the notification is its way of saying:
You called
accept()
earlier, and it failed withWSAEWOULDBLOCK
. Go ahead and call it again, it should succeed this time.
The solution is to call accept()
before calling WSAEventSelect()
, and only call WSAEventSelect()
after you get WSAEWOULDBLOCK
. For this to work as you expect, you need to have set the listening socket to non-blocking. (That might seem obvious, but it isn't actually required.)
After an accept event occured you must not do a WSAResetEvent(hAcceptEven). You must issue a WSAEnumNetworkEvents ( listener, hAcceptEvent, &some_struct). This functions clears the internal state of the socket (ar copies this state into some_struct) and after that you can receive new connections.
精彩评论