Boost.Thread Assertion/Crash on Windows during win32::WaitForSingleObject_问答_开发者

I have a rarely occurring issue in my code in which an assertion is triggered, involving the Boost.Thread library. I haven't been able to reproduce this issue using a stand-alone example, and I don't really know what is causing it, so it's hard to provide a sample case. I am hoping that anybody familiar with the internals on boost.thread may be able to help.

Here is what I know:

The problem occurs when a boost::lock_guard<boost::recursive_mutex> (or variations of unique_lock and normal non-recursive mutex) is declared.
It happens in a handler function for Boost.Asio. On the stack is the thread that does io_service::run, a bunch of glue to call the Asio callback function, followed by my callback function (triggered by an async_write call). The first line of that function is the declaration of the lock_guard<> which is causing the problem.
this inside of my function is valid, and has not been deleted or anything like that. The debugger shows that it points to valid data. The mutex that is being locked in my handle_write function also guards against deletion of the memory that the handling function uses.
This works fine, I'd say 9,999 times out of 10,000, with heavy multi-threaded usage going on. The problem occurs with the same frequency if I tone down the number of threads used by the application to just one thread which handles Asio run() calls, and a main UI thread.
The first line of my code calls the lock() method of the mutex (in the ctor of boost::unique_lock<>), then calls lock() in boost::detail::basic_recursive_mutex_impl, which calls the lock() method of boost::detail::basic_timed_mutex.

In Boost 1.46, the assertion (BOOST_VERIFY) is on line 78 of basic_timed_mutex.hpp, which calls win32::WaitForSingleObject:

do
{
    BOOST_VERIFY(win32::WaitForSingleObject(
                      sem,::boost::detail::win32::infinite)==0);
    clear_waiting_and_try_lock(old_count);
    lock_acquired=!(old_count&lock_flag_value);
}
while(!lock_acquired);

At the time the Boost.Thread code is waiting to acquire a lock on the mutex (what this code path that uses WaitForSingleObject) does, no other thread is holding the mutex (at least at the time the assertion occurs, and can be examined in the debugger). This is odd because it should be able to obtain the lock without having to wait for another thread to relinquish control.
Things look very odd, examining the members of the mutex. These are the values of all of the local and member variables (unless otherwise noted, they are the same every time this happens):
- sem - 0xdddddddddddddddd - This is always the same, on every crash.
- lock_acquired - false.
- old_count - 0xdddddddddddddddd.
- this - Appears to be valid, and the address of it matches what the object holding it has (the object of which handle_write is a method). It does not appear to have been deleted or messed with in any way.
- this->active_count - A negative integer, ranges I've seen have been between -570000000 and -580000000.
- this->event - 0xdddddddddddddddd.

I am unfortunately unable to see the result of the WaitForSingleObject call. The MSDN entry on the API function indicates four possible return types, two of them impossible in this scenario. Since WaitForSingleObject is being called with an invalid event handle (sem = 0xdddddddddddddddd), I assume it's returning 0xFFFFFFFF and GetLastError would indicate that an invalid handle has been supplied.

So the actual problem, it seems, is that the get_event() method of basic_timed_mutex is returning 0xdddddddddddddddd. However, the MSDN entry for CreateEvent (which get_event() eventually开发者_如何学Go uses) tells me that it returns either a valid handle to an event, or NULL.

Again, this is probably the best description of the problem I can provide since it isn't reproducible reliably outside of this specific application. I hope somebody has ideas as to what may be causing this!

I guess it will be very difficult to give a precise answer to your problem but it seems that you have a heap corruption problem, have you tried to use AppVerifier with normal pageheap enabled? If you then attach a debugger to the process and have a heap corruption it will hopefully break when a corrupted heap block is encountered and you can even look at the callstack of the allocating code.

edit: if using WinDbg you can also put a conditional breakpoint on WaitForSingleObject (or any other function) breaking only if the call fails and then check the last error, e.g.: bp kernel32!WaitForSingleObject "gu; .if(eax == 0) {g}" -> this will tell the debugger to at the breakpoint i) run to the end of the function (gu) and ii) check the return value (stored in the EAX register) and continue execution (g) if everything was fine. In case that an error is returned you can check the value of GetLastError() with the !gle extension command.