How do I prevent race condition WITHOUT using locks in C++?_问答_开发者

How do开发者_开发问答 I prevent a race condition WITHOUT locking or using mutexes/semaphors in C++? I'm dealing with a nested for loop in which I will be setting a value in an array:

for (int i = 0; i < m; ++i)
  for (int j = 0; j < n; ++j)
    for (int k = 0; k < o; ++k)
      array[k] += foo(...);

More or less, I want to deal with this so that I can ensure different threads running at the same time don't write to array[k] at the same time. Any suggestions on how to approach this?

Edit: I am running on a Linux machine and I also have to use the Intel compiler. I will be using "icc" instead of "gcc" to compile the code.

For this particular loop, turn it inside out. Put k on the outside, then you can farm k=0 out to a different thread than k=1 and so on.

As long as foo() doesn't depend on array[k] where k != it's current k, then you're golden.

Assuming windows and that array contains elements of type LONG you could do something like:

for (int i = 0; i < m; ++i) 
   for (int j = 0; j < n; ++j) 
      for (int k = 0; k < o; ++k)  {
          LONG val = foo(...);
          InterlockedAdd( &array[k], val);
      }

If you're not working in Windows your platform may have a similar set of APIs. As long as your platform has an InterlockedCompareExchange() type of API you can write your own version of InterlockedAdd().

Something like the following (disclaimer - untested):

 int InterlockedAdd( int volatile* pDest, int operand)
 {
      int curval = *pDest;
      int oldval;

      do {
           oldval = curval;
           curval = InterlockedCompareExchange( pDest, oldval + operand, oldval);
      } while (curval != oldval);

      return oldval+operand;
 }

As far as I know, Boost only has limited support for atomic/interlocked operations, apparently only enough to support atomic manipulation of reference counts. I don't think that the support for interlocked operations in Boost is more than implementation detail (I'm currently dealing with an somewhat older version of Boost, so it's possible that this isn't the case anymore).

There are some portable libraries that support atomic compare and exchange and other atomic operations as documented parts of the interface:

Apache APR: http://apr.apache.org/docs/apr/1.4/group__apr__atomic.html
glib: http://library.gnome.org/devel/glib/stable/glib-Atomic-Operations.html
Intel Thread Building Blocks: http://www.threadingbuildingblocks.org/

Also note that C++0x will have support for atomic operations like compare/exchange - I'm not sure what the level of support is in current C++ compilers (it doesn't appear to being VS 2010).

Assuming the array holds integers, use gcc's atomic builtins. __sync_fetch_and_add should do the trick.

The way you want it, it cannot be done! (see the comment of sbi)

You could use a shared memory, but there will still be locks.
You could also use just one thread for writing and reading to the array. If you think it's easier to set up the correct protocol for it, go on.

Anyway, there are already good solutions given using locks (either directly or indirectly). Just pick one :)

Partition the innermost loop among the threads. Thread T1 processes indices in the range [0,L), thread T2 processes indices in the range [L, 2L), etc. L=o/n where n is the number of threads. This assumes that the foo() call does not use other locations that might be concurrently computed.

EDIT: using interlocked operations, as others have suggested, will give correct result but it might degrade performance badly. (If the inner loop is short, many threads will be competing for few memory locations, which will effectively serialize your program.)

Easiest way (though not the most efficient!) is to wrap the loop in a "critical" section.

See here Wikipedia this will restrict the loop code to a single thread at any given time.

using interlocked operations, as others have suggested, will give correct result but it might degrade performance badly. (If the inner loop is short, many threads will be competing for few memory locations, which will effectively serialize your program.) link|flag

Not true, my friend. Actually interlocked operations are superior to all kinds of locking. More precisely: all synchronization objects (critical sections, mutexes, events, etc.) are definitely implemented in terms of interlocked operations. In fact interlocked operations are the only processor instructions that may guarantee the synchronization consistency. It's simply impossible to implement a synchronization object without using interlocked operations at all.

The other question is the scope of locking. You probably wanted to say that the synchronization scope (implemented either with sync objects or directly with interlocked operations) inside an inner loop may lead to the performance degradation, since it's executed many times.

Well, this is true. But on the other hand you lock only what you need only for the needed duration. Hence you prevent potential synchronization collisions.