I have been going over the practicality of some of the new parallel features in .Net 4.0.
Say I have code like so:
foreach (var item in myEnumerable)
myDatabase.Insert(item.ConvertToDatabase());
Imagine myDatabase.开发者_StackOverflow社区Insert is performing some work to insert to a SQL database.
Theoretically you could write:
Parallel.ForEach(myEnumerable, item => myDatabase.Insert(item.ConvertToDatabase()));
And automatically you get code that takes advantage of multiple cores.
But what if myEnumerable can only be interacted with by a single thread? Will the Parallel class enumerate by a single thread and only dispatch the result to worker threads in the loop?
What if myDatabase can only be interacted with by a single thread? It would certainly not be better to make a database connection per iteration of the loop.
Finally, what if my "var item" happens to be a UserControl or something that must be interacted with on the UI thread?
What design pattern should I follow to solve these problems?
It's looking to me that switching over to Parallel/PLinq/etc is not exactly easy when you are dealing with real-world applications.
The IEnumerable<T>
interface is inherently not thread safe. Parallel.ForEach
will automatically handle this, and only parallelize the items coming out of your enumeration. (The sequence will always be traversed, one element at a time, in order - but the resulting objects get parallelized.)
If your classes (ie: the T) cannot be handled by multiple threads, then you should not try to parallelize this routine. Not every sequence is a candidate for parallelization - which is one reason why this isn't done automatically by the compiler ;)
If you're doing work which requires working with the UI thread, this is still potentially possible. However, you'll need to take the same care you would anytime you're dealing with user interface elements on background threads, and marshal the data back onto the UI thread. This can be simplified in many cases using the new TaskScheduler.FromCurrentSynchronizationContext
API. I wrote about this scenario on my blog here.
All of these are legitimate issues - and PLINQ/TPL don't attempt to address them. It's still your job as a developer to write code that can function correctly when parallelized. There's no magic that the compiler/TPL/PLINQ can do to convert code that is unsafe for multithreading into thread-safe code ... you have to make sure that you do so.
For some of the situations you described, you should first decide whether parallelization is even sensible. If the bottleneck will be acquiring connection to a database or ensuring correct sequencing of operations, then perhaps multithreading isn't appropriate.
In the case of how TPL streams an enumerable to multiple threads, your supposition is correct. The sequence is enumerated on a single thread and each work item is then (potentially) dispatched to a separate thread to be acted on. The IEnumerable<T>
interface is inherently not threadsafe, but TPL handles this behind the scenes for you.
What PLINQ/TPL do help you do, is manage when and how to dispatch work to multiple threads. The TPL detects when there are multiple cores on a machine and automaticaly scales the number of threads used to process the data. If a machine only has a single CPU/Core, then TPL may choose not to parallelize the work. The benefit to you, the developer, is not having to write two different paths - one for parallel logic, one for sequential. However, the responsibility is still yours to make sure that your code can be safely accessed from multiple threads concurrently.
What design pattern should I follow to solve these problems?
There's no one answer to this question... however, a general practice is to employ immutability in your object design. Immutability makes it safer to consume an object across multiple threads and is one of the most common practices in making operations parllelizable. In fact, languages like F# make use of immutability extensively to allow the language to help make concurrent programming easier.
If you're on .NET 4.0, you should also look into the ConcurrentXXX
collections classes in System.Collections.Concurrent
. This is where you'll find some lock-free and fine-grained locking collection constructs that make writing multithreaded code easier.
As you have surmised, taking advantage of Parallel.For
or Parallel.ForEach
requires that you have the ability to compose your work into discrete units (embodied by your lambda statement that is passed to the Parallel.ForEach
) that can be executed independently.
there is a great discussing in answers and comments here: Parallel.For(): Update variable outside of loop.
Answer is no: parallel extensions will not think for you. Multithread issues are still actual here. This is nice syntax sugar, but not a panacea.
This is a very good question and the answer is not 100% clear/concise. I would point you to this reference from Micrsoft, it lays out a good bit of detail as to WHEN you should use the parallel items.
精彩评论