How are Scala 2.9 parallel collections working behind the scenes?_问答_开发者

How are Scala 2.9 parallel collections working behind the scenes?

开发者 https://www.devze.com 2023-03-12 18:19 出处：网络

Scala 2.9 introduced parallel collections. They are a really great tool for certain tasks. However, how do they work internally and am I able to influence the behavior/configuration?

What method do they use to figure out the optimal number of threads? If I am not satisfied with the result are there any configuration parameters to adjust?

I'm not only interested how many threads are actually c开发者_StackOverflowreated, I am also interested in the way how the actual work is distributed amongst them. How the results are collected and how much magic is going on behind the scenes. Does Scala somehow test if a collection is large enough to benefit from parallel processing?

Briefly, there are two orthogonal aspects to how your operations are parallelized:

The extent to which your collection is split into chunks (i.e. the size of the chunks) for a parallelizable operation (such as map or filter)
The number of threads to use for the underlying fork-join pool (on which the parallel tasks are executed)

For #2, this is managed by the pool itself, which discovers the "ideal" level of parallelism at runtime (see java.lang.Runtime.getRuntime.availableProcessors)

For #1, this is a separate problem and the scala parallel collections API does this via the concept of work-stealing (adaptive scheduling). That is, when a particular piece of work is done, a worker will attempt to steal work from other work-queues. If none is available, this is an indication that all of the processors are very busy and hence a bigger chunk of work should be taken.

Aleksandar Prokopec, who implemented the library gave a talk at this year's ScalaDays which will be online shortly. He also gave a great talk at ScalaDays2010 where he describes in detail how the operations are split and re-joined (there are a number of issues that are not immediately obvious and some lovely bits of cleverness in there too!).

A more comprehensive answer is available in the PDF describing the parallel collections API.