It seems that Python standa开发者_开发百科rd library lacks various useful concurrency-related concepts such as atomic counter, executor and others that can be found in e.g. java.util.concurrent. Are there any external libraries that would provide easier building blocks for concurrent Python applications?
Kamaelia, as already mentioned, is aimed at making concurrency easier to work with in python.
Its original use case was network systems (which are a naturally concurrent) and developed with the viewpoint "How can we make these systems easier to develop and maintain".
Since then life has moved on and it is being used in a much wider variety of problem domains from desktop systems (like whiteboarding applications, database modelling, tools for teaching children to read and write) through to back end systems for websites (like stuff for transcoding & converting user contributed images and video for web playback in a variety of scenarios and SMS / text messaging applications.
The core concept is essentially the same idea as Unix pipelines - except instead of processes you can have python generators, threads, or processes - which are termed components. These communicate over inboxes and outboxes - as many as you like of each, rather than just stdin/stdout/stderr. Also rather than requiring serialised file interfaces, you pass between components fully fledged python objects. Also rather than being limited to pipelines, you can have arbitrary shapes - called graphlines.
You can find a full tutorial (video, slides, downloadable PDF booklet) here:
- http://www.kamaelia.org/PragmaticConcurrency
Or the 5 minute version here (O'Reilly ignite talk):
- http://yeoldeclue.com/cgi-bin/blog/blog.cgi?rm=viewpost&nodeid=1235690128
The focus on the library is pragmatic development, system safety and ease of maintenance though some effort has gone in recently towards adding some syntactic sugar. Like anything the developers (me and others :-) welcome feedback on improving it.
You can also find more information here: - http://www.slideshare.net/kamaelian
Primarily, Kamaelia's core (Axon) was written to make my day job easier, and to wrap up best practice (message passing, software transactional memory) in a reusable fashion. I hope it makes your life easier too :-)
Although it may not be immediately obvious, itertools.count is indeed an atomic counter (the only operation on an instance x
thereof, spelled next(x)
, is equivalent to an "atomic ++x
" if C had such a concept;-). Edit: at least, this surely holds in CPython; I thought it was part of the Python standard definition but apparently IronPython and Jython disagree (not ensuring thread-safety of count.next in their current implementations) so I may well be wrong!
That is, suppose you currently have a data structure such as:
counters = dict.fromkeys(words_of_interest, 0)
...
if w in counters: counters[w] += 1
and your problem is that the latter increment is not atomic, so if two threads are at the same time dealing with the same word of interest the two increments might interfere (only one would "take", so the counter would be incremented only by one, not by two). Then:
counters = dict((w, itertools.count()) for w in words_of_interest)
...
if w in counters: next(counters[w])
will perform the same operations, but in an atomic way.
(There is unfortunately no obvious, documented way to "extract the current value of the counter", though in fact str(x)
does return a string such as 'count(3)'
from which the current value can be parsed out again;-).
Concurrency in Python (at least CPython) and Java are wildly different, at least in part because of the Global Interpreter Lock (GIL). In general, concurrency in Python is achieved not with threads, but processes. See multiprocessing
for the "standard" concurrency module.
Also, check out "A Curious Course on Coroutines and Concurrency" for some concurrency techniques that were pretty new to me coming from Java. David Beazley (the author) is a Smart Guy™ when it comes to Python in general, and concurrency in particular.
kamaelia provides tools for abstracting concurrency to threads or process etc.
P-workers creates a "Job-Worker" abstraction over the python multiprocessing library. It simplifies concurrency with multiprocessing by starting "Workers" that have specific skills/attributes (defined functions), and providing a queue where they receive "Jobs" from. Its somewhat analogous to a thread pool, only with processes instead of threads. Therefore its better suited for a high number of CPU instructions. You can also use it to spawn multiple instances of a single application or even spawn "Workers" that have multiple threads.
精彩评论