Thread Cooperation on Dual-CPU Machines_问答_开发者

I remember in a course I took in college, one of my favorite examples of a race condition was one in which a simple main() method started two threads, one of which incremented a shared (global) variable by one, the other decrementing it. Pseudo code:

static int i = 10;

main() {
    new Thread(thread_run1).start();
    new Thread(thread_run2).start();
    waitForThreads();
    print("The value of i: " + i);
}

thread_run1 {
    i++;
}

thread_run2 {
    i--;
}

The professor then asked what the value of i is after开发者_如何学Go a million billion zillion runs. (If it would ever be anything other than 10, essentially.) Students unfamiliar with multithreading systems responded that 100% of the time, the print() statement would always report i as 10.

This was in fact incorrect, as our professor demonstrated that each increment/decrement statement was actually compiled (to assembly) as 3 statements:

1: move value of 'i' into register x
2: add 1 to value in register x
3: move value of register x into 'i'

Thus, the value of i could be 9, 10, or 11. (I won't go into specifics.)

My Question:

It was (is?) my understanding that the set of physical registers is processor-specific. When working with dual-CPU machines (note the difference between dual-core and dual-CPU), does each CPU have its own set of physical registers? I had assumed the answer is yes.

On a single-CPU (multithreaded) machine, context switching allows each thread to have its own virtual set of registers. Since there are two physical sets of registers on a dual-CPU machine, couldn't this result in even more potential for race conditions, since you can literally have two threads operating simultaneously, as opposed to 'virtual' simultaneous operation on a single-CPU machine? (Virtual simultaneous operation in reference to the fact that register states are saved/restored each context switch.)

To be more specific - if you were running this on an 8-CPU machine, each CPU with one thread, are race conditions eliminated? If you expand this example to use 8 threads, on a dual-CPU machine, each CPU having 4 cores, would the potential for race conditions increase or decrease? How does the operating system prevent step 3 of the assembly instructions from being run simultaneously on two different CPUs?

Yes, the introduction of dual-core CPUs made a significant number of programs with latent threading races fail quickly. Single-core CPUs multitask by the scheduler rapidly switching the threading context between threads. Which eliminates a class of threading bugs that are associated with a stale CPU cache.

The example you give can fail on a single core as well though. When the thread scheduler interrupts the thread just as it loaded the value of the variable in a register in order to increment it. It just won't fail nearly as frequently because the odds that the scheduler interrupts the thread just there isn't that great.

There's an operating system feature to allow these programs to limp along anyway instead of crashing within minutes. Called 'processor affinity', available as the AFFINITY command line option for start.exe on Windows, SetProcessAfinityMask() in the winapi. Review the Interlocked class for helper methods that atomically increment and decrement variables.

You'd still have a race condition - it doesn't change that at all. Imagine two cores both performing an increment at the same time - they'd both load the same value, increment to the same value, and then store the same value... so the overall increment from the two operations would be one instead of two.

There are additional causes of potential problems where memory models are concerned - where step 1 may not really retrieve the latest value of i, and step 3 may not immediately write the new value of i in a way which other threads can see.

Basically, it all becomes very tricky - which is why it's generally a good idea to either use synchronization when accessing shared data or to use lock-free higher level abstractions which have been written by experts who really know what they're doing.

First, dual processor versus dual core has no real effect. A dual core processor still has two completely separate processors on the chip. They may share some cache, and do share a common bus to memory/peripherals, but the processors themselves are entirely separate. (A dual-threaded single code, such as Hyperthreading) is a third variation -- but it has a set of registers per virtual processor as well. The two processors share a single set of execution resources, but they retain completely separate register sets.

Second, there are really only two cases that are realy interesting: a single thread of execution, and everything else. Once you have more than one thread (even if all threads run on a single processor), you have the same potential problems as if you're running on some huge machine with thousands of processors. Now, it's certainly true that you're likely to see the problems manifest themselves a lot sooner when the code runs on more processors (up to as many as you've created threads), but the problems themselves haven't/don't change at all.

From a practical viewpoint, having more cores is useful from a testing viewpoint. Given the granularity of task switching on a typical OS, it's pretty easy to write code that will run for years without showing problems on a single processor, that will crash and burn in a matter of hours or even minute when you run it on two more or physical processors. The problem hasn't really changed though -- it's just a lot more likely to show up a lot more quickly when you have more processors.

Ultimately, a race condition (or deadlock, livelock, etc.) is about the design of the code, not about the hardware it runs on. The hardware can make a difference in what steps you need to take to enforce the conditions involved, but the relevant differences have little to do with simple number of processors. Rather, they're about things like concessions made when you have not simply a single machine with multiple processors, but multiple machines with completely separate address spaces, so you may have to take extra steps to assure that when you write a value to memory that it becomes visible to the CPUs on other machines that can't see that memory directly.