开发者

pthread_mutex_timedlock and the deadlock

开发者 https://www.devze.com 2023-04-03 18:02 出处:网络
Typically if a task1 holds lock A wants to take a lock B and another task2 has taken lock B and is waiting for lock A held by task1), this causes the deadlock.

Typically if a task1 holds lock A wants to take a lock B and another task2 has taken lock B and is waiting for lock A held by task1), this causes the deadlock.

But when it comes to pthread_mutex_timedlock, it attempts th开发者_如何学Pythone mutex lock or timeout after the specified timeout.

I hit the deadlock scenario where i was trying to take the timed lock, which would have timed out eventually, which puzzles me.

edit: Deadlocks can be avoided by having a better design, which is what i ended up doing, i made sure that the order of taking mutex locks is same, to avoid deadlock but the question remains open as to if the deadlock can be avoided since i chose timedlock

Can someone explain me this behaviour ?

Edit: Attaching a sample code to make the scenario more clear(real tasks are fairly complicated and run into thousands of lines)

T1

pthread_mutex_lock(&lockA);
//call some API, which results in a lock of m2
pthread_mutex_lock(&lockB);
//unlock in the order
pthread_mutex_unlock(&lockB);
pthread_mutex_unlock(&lockA);

T2

pthread_mutex_lock(&lockB);
//call some API, which results in locking m1
pthread_mutex_timedlock(&lockA,<10 sec>); 

The crash is seen in the context of T2, bt:

Program terminated with signal 6, Aborted.
#0  0x57edada0 in raise () from /lib/libc.so.6
(gdb) bt
#0  0x57edada0 in raise () from /lib/libc.so.6
#1  0x57edc307 in abort () from /lib/libc.so.6
#2  0x57ed4421 in __assert_fail () from /lib/libc.so.6
#3  0x57bb2a7c in pthread_mutex_timedlock () from /lib/libpthread.so.0

I traced the error to following

pthread_mutex_timedlock: Assertion `(-(e)) != 35 || (kind != PTHREAD_MUTEX_ERRORCHECK_NP && kind != PTHREAD_MUTEX_RECURSIVE_NP)' failed.


In glibc sources pthread_mutex_timedlock() this assert looks like this:

    int e = INTERNAL_SYSCALL (futex, __err, 4, &mutex->__data.__lock,
                  __lll_private_flag (FUTEX_LOCK_PI,
                          private), 1,
                  abstime);
    if (INTERNAL_SYSCALL_ERROR_P (e, __err))
      {
    if (INTERNAL_SYSCALL_ERRNO (e, __err) == ETIMEDOUT)
      return ETIMEDOUT;

    if (INTERNAL_SYSCALL_ERRNO (e, __err) == ESRCH
        || INTERNAL_SYSCALL_ERRNO (e, __err) == EDEADLK)
      {
        assert (INTERNAL_SYSCALL_ERRNO (e, __err) != EDEADLK
            || (kind != PTHREAD_MUTEX_ERRORCHECK_NP
            && kind != PTHREAD_MUTEX_RECURSIVE_NP));
        /* ESRCH can happen only for non-robust PI mutexes where
           the owner of the lock died.  */
        assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH
            || !robust);

It is probably that e == EDEADLK and kind is either PTHREAD_MUTEX_ERRORCHECK_NP or PTHREAD_MUTEX_RECURSIVE_NP. The other thing to notice is that timeout is handled before this check, i.e. you don't hit the timeout.

In the kernel it is futex_lock_pi_atomic() returning EDEADLK code:

 /*
  * Detect deadlocks.
  */
 if ((unlikely((curval & FUTEX_TID_MASK) == vpid)))
         return -EDEADLK;

 /*

The above piece compares the TID of the thread that has locked the mutex and the TID of the thread that tries to acquire the mutex. If they are the same it suggests that the thread is trying to acquire the mutex that it has already acquired.


first of all what was the time specified for time out ? Was it large?

pthread_mutex_timedlock fails in three condtion 1> A deadlock condition was detected or the current thread already owns the mutex. 2>The mutex could not be acquired because the maximum number of recursive locks for mutex has been exceeded. 3>The value specified by mutex does not refer to an initialized mutex object.

was your code subjected to any of the above.

Also code snipet may help to clear things up for us to see the problem.

0

精彩评论

暂无评论...
验证码 换一张
取 消