I am just learning how to write mutithreaded programs right now and I have a hypothetical question about how many threads are optimal for a program.
Let me describe 2 scenarios.
The first scenario is that i have a program that is easily multi threaded but each thread is going to be doing a lot of work (execution times for each thread is on the order of seconds).
The second scenario is that i have a program that is also easily multi thread but each thread has a very short execution time, on the order of milliseconds.
In either one of these scenarios, what is the most efficient way of mutithreading the programs? Is it either creating as many threads as my system memory will allow, or wait for threads to finish before creating new ones so that I only ever have a maxim of 4 worker threads running at any one time.
On one hand, many threads may have overhead problems with cores switching in between threads (from my understanding, its not such a severe overhead). On the other hand, if I limit the number of threads running, that means 开发者_如何学运维that I'll be running extra checking conditions and locking and unlocking a counter variable to keep track of the number of threads running, and creating new threads when old ones finish.
I can see that if there are many small threads, it would be best to simply overload my system with as many threads as possable as since there wont be too much thread switching before a thread finishes running. That would save me the overhead of constantly keeping track of the number of threads.
Also, if there are only a few large threads (by few, i mean a couple hundred or so large ones) it would make sense to keep track of the threads so that we keep the threads at optimal numbers such that there isent very much thread switching (as since the overhead would be greater as we would likely switch many times before a single thread finishes).
So would these assumptions be correct for each case or is there a universal way of doing things that would be correct in all situations?
Note: this is assuming a muti core system (for now, lets ignore hyperthreading) and lets ignore any of the typical problems associated with mutithreading (assume that all threads have private write locations and can only read from public ones, locking and unlocking only happens when incrementing or decrementing the counter for number of active threads).
Thanks,
-Faken
Scenario #1: Make n threads, where 'n' is the number of CPU cores
Scenario #2: The same, but instead of creating and killing the threads all the time, use a Workitem / Thread Pool based approach, like the .NET Parallel Framework does.
Edit: This is a good article that covers #2 - http://msdn.microsoft.com/en-us/magazine/cc163340.aspx ; let PFx figure out how many threads to run, you just describe how tasks are related to each other.
The normal approach to this is to make the thread count configurable, and profile the application performance across several configurations.
Also note that in many cases, it's not the overhead associated with many threads or context switching but bottlenecks caused by synchronizing access to shared resources that cause inefficiency in multithreaded apps. Even if you assume that your code is deadlock-proof, if there's a lot of IO going on, poor synchronization implementations can effectively kill any benefit your parallelization would otherwise have gained you.
This isn't a question with a hard-and-fast answer, a few points though:
Since your threads are so short lived, maybe you should think about using a pool to manage them? You could create a pool with a number of threads that suits the host system and task profile (say one for each core to start with) and feed it work to do on a queue of some sort. By doing this you eliminate the overhead of starting new threads, allocating a stack for each one etc for each task.
As for appropriate number of threads for the pool, this rather depends on the types of tasks you are running. If they are CPU bound tasks, then one thread per CPU is a good fit: you avoid context switching when you don't need to then. If on the other hand, they are IO bound tasks, say threads doing socket communication, then you might want to double that number foe example, so you can make better use of your processors when you are waiting for IO input.
Anyway, in short, there is no one-size-fits-all approach for this kind of stuff. As ever, profile your code in action to find out where the inefficiencies are and tune it based on your results.
I would start with a good enough number and then collect statistics to find out the correct number of threads to run to achieve good performance.
Assuming you mean a Windows program, even if it is a C++ rather than a dot-Net program it will repay you to skim Joe Duffy's "Concurrent Programming on Windows" before beginning. He makes a nice pitch for using the thread-pooling-routines provided by Windows, most convincingly because they already internally tune for the processor configuration, taking that burden off your shoulders.
If you then go ahead and roll-your-own anyway, the gotcha's discussed throughout the book will surely save you tripping over the standard pitfalls.
Threads are not cheap. I know of basically two reasons to use them:
To get multiple pieces of hardware working in parallel, whether they be CPU cores, disk heads, some other kind of machine, or servers on the other side of the world.
To get multiple people working in parallel, like users with their own sessions. Here the advantage is not speed, but ease of coding each user's interaction sequence.
Or both, like if you have one thread to crunch, and one to respond to the user.
精彩评论