I want to know how race condition will happen during context switching, and where and how this happens.
I know about race condition can occur when accessing shared resource, I just need to understand it better. Can someone help me grasp this?
Here's a classic example:
int global_int = 0;
void update () {
++ global_int;
/* generated assembly is something like
register = global_int
increment register
global_int = register
*/
}
Say the first thread starts running, calls update()
, but gets interrupted (by a signal, context switch, whatever) in-between the second and third instructions. At this stage global_int==0
and register==1
: it hasn't saved the result yet.
Now suppose a second thread runs update()
and completes, so global_int==1
. The first thread resumes and saves register
(which is 1) to global_int
, yielding no change.
In this situation, global_int==1
after two calls to update()
have completed. Anything which assumes that update()
updates global_int
will now be broken.
In general it is very hard to detect this problem by looking at code, you have to analyse the data and say to yourself "global_int
is being accessed by different threads, I'd better guard it with a mutex". If you try to get clever and worry about how the threads will access it so as to avoid the expense of a lock, you will probably get it wrong except in trivial cases.
Race conditions are a consequence of concurrent execution code which accesses a shared resource without proper mechanisms to ensure the consistency of that shared resource.
A race condition could occur during context switching if there is a bug in the implementation of the thread scheduler that causes the code used to perform the context switch to access a shared resource without providing proper consistency guarantees. There is nothing about the code that implements context-switching that makes it unable to contain race conditions.
Suppose you were on a single-processor machine with a scheduler that is basically performing time-slicing of the available processor's resources (i.e., we're on a really simple system). Then suppose you have a critical section of code, but you did not guard that critical section with a mutex or other synchronization primitive.
Assume thread A
is inside the critical section. When the time-slice for thread A
is up, the scheduler schedules another thread B
and stops thread A
. Thread B
then enters the critical section (since there was no guard), and modifies the values in shared memory in the critical section. When thread B's
time-slice is up, the OS schedules thread A
again which continues from the point it left off inside the critical section. The only problem now though is that the values thread A
is working with are not what they were when it was stopped for the context-switch ... they're completely different since they were modified by thread B
. Thus you have a race-condition.
精彩评论