Some discussion in another question has encouraged me to to better understand cases where locking is required in multithreaded Python programs.
Per this article on threading in Python, I have several solid, testable examples of pitfalls that can occur when multiple threads access shared state. The example race condition provided on this page involves races between threads reading and manipulating a shared variable stored in a dictionary. I think the case for a race here is very obvious, and fortunately is eminently testable.
However, I have been unable to evoke a race condition with atomic operations such as list appends or variable increments. This te开发者_如何学编程st exhaustively attempts to demonstrate such a race:
from threading import Thread, Lock
import operator
def contains_all_ints(l, n):
l.sort()
for i in xrange(0, n):
if l[i] != i:
return False
return True
def test(ntests):
results = []
threads = []
def lockless_append(i):
results.append(i)
for i in xrange(0, ntests):
threads.append(Thread(target=lockless_append, args=(i,)))
threads[i].start()
for i in xrange(0, ntests):
threads[i].join()
if len(results) != ntests or not contains_all_ints(results, ntests):
return False
else:
return True
for i in range(0,100):
if test(100000):
print "OK", i
else:
print "appending to a list without locks *is* unsafe"
exit()
I have run the test above without failure (100x 100k multithreaded appends). Can anyone get it to fail? Is there another class of object which can be made to misbehave via atomic, incremental, modification by threads?
Do these implicitly 'atomic' semantics apply to other operations in Python? Is this directly related to the GIL?
Appending to a list is thread-safe, yes. You can only append to a list while holding the GIL, and the list takes care not to release the GIL during the append
operation (which is, after all, a fairly simple operation.) The order in which different thread's append operations go through is of course up for grabs, but they will all be strictly serialized operations because the GIL is never released during an append.
The same is not necessarily true for other operations. Lots of operations in Python can cause arbitrary Python code to be executed, which in turn can cause the GIL to be released. For example, i += 1
is three distinct operations, "get i
', "add 1 to it" and "store it in i
". "add 1 to it" would translate (in this case) into it.__iadd__(1)
, which can go off and do whatever it likes.
Python objects themselves guard their own internal state -- dicts won't get corrupted by two different threads trying to set items in them. But if the data in the dict is supposed to be internally consistent, neither the dict nor the GIL does anything to protect that, except (in usual thread fashion) by making it less likely but still possible things end up different than you thought.
In CPython, thread switching is done when sys.getcheckinteval() bycodes have been executed. So a context switch can never occur during the execution of a single bytecode, and operations that are encoded as a single bytecode are inherently atomic and threadsafe, unless that bytecode executes other Python code or calls C code that releases the GIL. Most operations on the built-in collection types (dict, list etc) fall into the 'inherently threadsafe' category.
However this is an implementation detail that is specific to the C implementation of Python, and should not be relied upon. Other versions of Python (Jython, IronPython, PyPy etc) may not behave in the same way. There is also no guarantee that future versions of CPython will keep this behaviour.
精彩评论