开发者

What could be causing a deadlock or otherwise causing this concurrency test to inconsistently fail?

开发者 https://www.devze.com 2023-01-18 10:16 出处:网络
I\'ve been fiddling around with CHESS, which seems like an incredibly useful tool. However, ironically, I seem to be de开发者_运维技巧aling with a Heisenbug in one of my test methods. The results repo

I've been fiddling around with CHESS, which seems like an incredibly useful tool. However, ironically, I seem to be de开发者_运维技巧aling with a Heisenbug in one of my test methods. The results reported by CHESS when I run this test are unpredictable:

  • Sometimes the test will pass
  • Sometimes the test will fail, with no further description (simply: "Test failed")
  • Sometimes the test will fail with instructions to duplicate*
  • Sometimes the test will indicate "CHESS detected deadlock"

Initially, I thought this inconsistency must be due to the fact that the test involves the use of Random objects. It must have been that different seed values were yielding different outcomes, right?

So I updated the test to simply run for a predefined set of seed values (0 to 10). Thread-local Random objects get seeded by a (pseudo-)random value produced by a shared Random within a lock. The code looks basically exactly like this:

What could be causing a deadlock or otherwise causing this concurrency test to inconsistently fail?

(Update: I am running this on .NET 3.5, as CHESS only supports VS 2008. I wonder if the problem could have something to do with this?)

As I understand it, the above code should actually be pretty deterministic. Since sharedRandom is initialized with a known seed (between 0 and 10), the values produced by the localRandom object belonging to each thread running the code inside the Parallel.For call should be consistent from one test run to the next (which thread gets which seed from sharedRandom may differ between runs, but among the 5 iterations within Parallel.For, the same 5 seeds should be used for localRandom).

That's how I understand it. But from the CHESS results, I'm inclined to believe I must be missing something.

  1. Is there a deadlock in the above code that I'm too dumb to see?
  2. Should I not be using the Random class in concurrency-related tests?
  3. For those who have experience using CHESS: is it a reliable tool? Does it sometimes give false positives? This is actually a big one, as if it turns out that this scenario is common (inconsistent test results), then perhaps it'd be advisable for me to hold off on using CHESS at all for the time being.

*...which I haven't been able to figure out how to use -- but that's a separate issue.


No answers, I'll give it a shot. It isn't obvious to me how the posted snippet could fail, I suspect the real problem is in the comment.

I don't have hands-on experience with CHESS but studied it well enough to know that you cannot rely on it to ever give you reproducible test results. It's approach to uncovering threading problems is very much statistical, injecting random delays in the threads. Designed to recreate the kind of threading problems that are so heavily influenced by timing, especially race conditions.

A race condition can go undetected for a very long time if code execution timing is predictable. And when it strikes, incredibly hard to diagnose. A good example of this is a large government project I heard of that shipped with the logging kept turned on. Because with it turned off it would no longer work and there was no good way to diagnose the problem without the logging info.

Threat CHESS as a diagnostic tool. If it raises a flag, you can be fairly sure that you have a real, but still hard to solve, threading problem.


I certainly don't see a deadlock there. It's likely that Random has internal locking, but that should be fine.

You might want to try Jinx (www.corensic.com). Rather than producing a report, Jinx just alters the effective performance of various CPUs. So it can't really produce false positives.

If the tiny sample deadlocks under Jinx, it's definitely capable of deadlocking during normal use. Assuming it does deadlock, you should be able to break into the deadlock with Visual studio and see where the threads are.

Disclaimer. I work for corensic. And I don't think the small snippet you posted has real a deadlock. But I'm curious, so let us know what you find.

0

精彩评论

暂无评论...
验证码 换一张
取 消