Winsock accept event sometimes stops signaling (WSAEventSelect)_问答_开发者

I have a problem with a piece of legacy c++/winsock code that is part of a multi-threaded socket server. The application creates a thread that handles connections from clients, of which there are typically a couple of hundred connected at any one time. It typically runs without a problem for several days (continuously), and then suddenly stops accepting connections. This only happens in production, never test.

It uses WSAEventSelect() to detect FD_ACCEPT network events. The (simplified) code for the connection handler is:

SOCKET listener;
HANDLE hStopEvent;

// ... initialise listener and hStopEvent, and other stuff ...

HANDLE hAcceptEvent = WSACreateEvent();
WSAEventSelect(listener, hAcceptEvent, FD_ACCEPT); 
HANDLE rghEvents[] = { hStopEvent, hAcceptEvent };

bool bExit = false;
while(!bExit)
{
    DWORD nEvent = WaitForMultipleObjects(2, rghEvents, FALSE, INFINITE);
    switch(nEvent)
    {
        case WAIT_OBJECT_0:
            bExit = true;
            break;
        case WAIT_OBJECT_1:
            HandleConnect();
            WSAResetEvent(hAcceptEvent);
            break;
        case WAI开发者_StackOverflowT_ABANDONED_0:
        case WAIT_ABANDONED_0 + 1:
        case WAIT_FAILED:
            LogError();
            break;
    }
}

From detailed logging I know that, when the problem occurs, the thread enters WaitForMultipleObjects() and never emerges, even though there are clients attempting to connect and waiting for an accept. The WAIT_FAILED and WAIT_ABANDONED_x conditions never occur.

While I haven't ruled-out a config problem on the server, or even some kind of resource leak (can't find anything), I am also wondering if the event created by WSACreateEvent() is somehow being 'dissassociated' from the FD_ACCEPT network event - causing it to never fire.

So, am I doing something wrong here? Is there something I should be doing that I'm not? Or a better way? I'd appreciate any suggestions! Thanks.

EDIT

The socket is a non-blocking socket.

EDIT

Problem solved by using the approach suggested by kipkennedy (below). Changed hAcceptEvent to be an auto-reset event, and removed the call to WSAResetEvent() which was no-longer needed.

Maybe an FD_ACCEPT is signaling during HandleConnect() after the accept() and before the return and subsequent ResetEvent(). Then, ResetEvent() ends up resetting all signals and no re-enabling accept() is ever called. For example, the following sequence is possible:

Event signaled, WaitForMultipleObjects() returns
During HandleConnect(), sometime after accept() is called, the Event is signaled again
HandleConnect() returns
ResetEvent() resets the event, masking the second signal
WaitForMultipleObjects() never returns since as far as Windows is concerned, it has already signaled the subsequent event and no subsequent accepts() have re-enabled it

A couple possible solutions: 1) loop on accept() in HandleConnect() until WSAEWOULDBLOCK is returned 2) use an auto-reset event or immediately reset the event before calling HandleConnect()

Code looks fine. The only thing I can suggest is calling WSAWaitForMultipleObjects() instead of the global version.

From reading the docs, it appears that WSAEventSelect() is as parsimonious about notifications as WSAAsyncSelect(). The stack doesn't signal FD_ACCEPT every time a connection comes in. To Winsock, the notification is its way of saying:

You called accept() earlier, and it failed with WSAEWOULDBLOCK. Go ahead and call it again, it should succeed this time.

The solution is to call accept() before calling WSAEventSelect(), and only call WSAEventSelect() after you get WSAEWOULDBLOCK. For this to work as you expect, you need to have set the listening socket to non-blocking. (That might seem obvious, but it isn't actually required.)

After an accept event occured you must not do a WSAResetEvent(hAcceptEven). You must issue a WSAEnumNetworkEvents ( listener, hAcceptEvent, &some_struct). This functions clears the internal state of the socket (ar copies this state into some_struct) and after that you can receive new connections.