I have a small test application that executes two threads simultaneously. One increments a static long _value
, the other one decrements it. I've ensured with ProcessThread.ProcessorAffinity
that the threads are associated with dif开发者_如何学运维ferent physical (no HT) cores to force intra processor communication and I have ensured that they overlap in execution time for a significant amount of time.
Of course, the following does not lead to zero:
for (long i = 0; i < 10000000; i++)
{
_value += offset;
}
So, the logical conclusion would be to:
for (long i = 0; i < 10000000; i++)
{
Interlocked.Add(ref _value, offset);
}
Which of course leads to zero.
However, the following also leads to zero:
for (long i = 0; i < 10000000; i++)
{
lock (_syncRoot)
{
_value += offset;
}
}
Of course, the lock
statement ensures that the reads and writes are not reordered because it employs a full fence. However, I cannot find any information concerning synchronization of processor caches. If there wouldn't be any cache synchronization, I'd think I should be seeing deviation from 0 after both threads were finished?
Can someone explain to me how lock
/Monitor.Enter/Exit
ensures that processor caches (L1/L2 caches) are synchronized?
Cache coherence in this case does not depend on lock
. If you use lock
statement it ensures that your assembler commands are not mixed.
a += b
is not an atomic to processor, it looks like:
- Load data into register from memory
- Increment data
- Store data back
And without lock it may be:
- Load data into register X from memory
- Load data into register Y from memory
- Increment data (in X)
- Decrement data (in Y)
- Store data back (from X)
- Store data back (from Y) // In this case increment is lost.
But it's not about cache coherence, it's a more high-level feature.
So, lock
does not ensures that the caches are synchronized. Cache synchronization is a processor internal feature which does not depend on code. You can read about it here.
When one core writes a value to memory and then when the second core try to read that value it won't have the actual copy in its cache unless its cache entry is invalidated so a cache miss occurs. And this cache miss forces cache entry to be updated to actual value.
The CLR memory model guarantees (requires) that loads/stores can't cross a fence. It's up to the CLR implementers to enforce this on real hardware, which they do. However, this is based on the advertised / understood behavior of the hardware, which can be wrong.
The lock
keyword is just syntactic sugar for a pair of System.Threading.Monitor.Enter()
and System.Threading.Monitor.Exit()
calls. The implementations of Monitor.Enter()
and Monitor.Exit()
put up a memory fence which entails performing architecture appropriate cache flushing. So your other thread won't proceed until it can see the stores that results from the execution of the locked section.
精彩评论