OpenMP & MPI explanation_问答_开发者_运维开发者技术经验分享

A few minutes ago I stumbled upon some text, which reminded me of something that has been wondering my mind for a while, but I had nowhere to ask.

So, in hope this may be the place, where people have hands on experience with both, I was wondering if someone could explain what is the difference between OpenMP and MPI开发者_如何转开发 ?

I've read the Wikipedia articles in whole, understood them in segments, but am still pondering; for a Fortran programmer who wishes one day to enter the world of paralellism (just learning the basics of OpenMP now), what is the more future-proof way to go ?

I would be grateful on all your comments

OpenMP is primarily for tightly coupled multiprocessing -- i.e., multiple processors on the same machine. It's mostly for things like spinning up a number of threads to execute a loop in parallel.

MPI is primarily for loosely couple multiprocessing -- i.e., a cluster of computers talking to each other via a network. It can be used on a single machine as kind of a degenerate form of a network, but it does relatively little to take advantage of its being a single machine (e.g., having extremely high bandwidth communication between the "nodes").

Edit (in response to comment): for a cluster of 24 machines, MPI becomes the obvious choice. As noted above (and similar to @Mark's comments) OpenMP is primarily for multiple processors that share memory. When you don't have shared memory, MPI becomes the clear choice.

At the same time, assuming you're going to be using multiprocessor machines (is there anything else anymore?) you might want to use OpenMP to spread the load in each machine across all its processors.

Keep in mind, however, that OpenMP is generally quite a lot quicker/easier to put into use than MPI. Depending on how much speedup you need, scaling up instead of out (i.e. a fewer machines with more processors each) can make the software development enough quicker/cheaper that it can be worthwhile even though it rarely gives the lowest price per core.

Another view, not inconsistent with what @Jerry has already written is that OpenMP is for shared-memory parallelisation and MPI is for distributed-memory parallelisation. Emulating shared-memory on distributed systems is rarely convincing or successful, but it's a perfectly reasonable approach to use MPI on a shared-memory system.

Of course, all (?) multicore PCs and servers are shared-memory systems these days so the execution model for OpenMP is widely applicable. MPI tends to come into its own on clusters on which processors communicate with each other over a network (which is sometimes called an interconnect and is often of a higher-spec than office Ethernet).

In terms of applications I would estimate that a large proportion of parallel programs can be successfully implemented with either OpenMP or MPI and that your choice between the two is probably best driven by the availability of hardware. Most of us (parallel-ists) would regard OpenMP as easier to get into than MPI, and it is certainly (I assert) easier to incrementally parallelise an existing program with OpenMP than with MPI.

However, if you need to use more processors than you can get in one box (and how many processors that is is increasing steadily) then MPI is your better choice. You may also stumble across the idea of hybrid programming -- for example if you have a cluster of multicore PCs you might use MPI between PCs, and OpenMP within a PC. I've not seen any evidence that the additional complexity of programming is rewarded by improved performance, and I've seen some evidence that it is definitely not worth the effort.

And, as one of the comments has already stated, I think that Fortran is future-proof enough in the domain of parallel, high-performance, scientific and engineering applications. The latest (2008) edition of the standard incorporates co-arrays (ie arrays which themselves are distributed across a memory system with non-local and local access) right into the language. There are even one or two early implementations of this feature. I don't yet have any experience of them and expect that there will be teething issues for a few years.

EDIT to pick up on a number of points in OP's comments ...

No, I don't think that it's a bad idea to approach parallel computing via OpenMP. I think that OpenMP and MPI (or, more accurately, the models of parallel computing that they implement) are complementary. I certainly use both, and I suspect that most professional parallel programmers do too. I hadn't done much OpenMP since leaving university about 6 years ago until about 2 years ago when multicores really started popping up everywhere. Now I probably do about equal amounts of both.

In terms of your further (self-)education I think that the book Using OpenMP by Chapman et al is better than the one by Chandra, if only because it is much more up to date. I think that the Chandra book pre-dates OpenMP 2, and the Chapman book pre-dates OpenMP 3 which is worth learning.

On the MPI side the books by Gropp et al, Using MPI and Using MPI-2 are indispensable; this is perhaps because they are (as far as I have found) the only tutorial introductions to MPI rather than because of their excellence. I don't think that they are bad, mind you but they don't have a lot of competition. I like Parallel Scientific Computing in C++ and MPI by Karniadakis and Kirby too; depending on your level of scientific computing knowledge though you may find much of the material too basic.

But what I think the field lacks entirely (hope someone can prove me wrong here ?) is a good textbook (or handful of textbooks) on the design of programs for parallel execution, something to help the experienced Fortran (in our case) programmer make the jump from serial to parallel program design. Lots of info on how to parallelise a loop or nest of loops, not so much on options for parallelising computations on structured positive semi-definite matrices (or whatever). For that level of information we have to dig quite hard into the research papers (ACM and IEEE digital libraries are well worth the modest annual costs -- if you are at an academic institution your library probably has subscriptions to these and a lot more, I'm lucky in that my employers pay for my professional society memberships and add-ons, but if they didn't I would pay myself).

As to your plans for a new lab with, say, 24 processors (CPUs ? or cores ?, doesn't really matter, just asking) then the route you take should depend on the depths of your pocket. If you can afford it I'd suggest:

-- Consider a shared-memory computer, certainly a year ago Sun, SGI and IBM could all supply a shared-memory system with that sort of number of cores, I'm not sure of the current state of the market but since you have until Feb to decide it's worth looking into. A shared-memory system gives you the shared-memory parallelism option, which a cluster doesn't, and message-passing on a shared-memory platform should run at light speed. (By the way, if you go this route, benchmark this aspect of the system, there have been some bad MPI implementations on shared-memory computers.) A good MPI implementation on a shared-memory computer (my last experience of this was on a 512 processor SGI Altix) doesn't send any messages, it just moves a few pointers around and is, consequently, blisteringly fast. Trouble with the Altix was that beyond 128 processors the memory bus tended to get overwhelmed by all the traffic; that was the time to switch to MPI on a cluster or an MPP box.

-- Again, if you can afford it, I'd recommend having a system integrator deliver you a working system, and avoid building a cluster (or whatever) yourself. If, like me, you are a programmer first and a reluctant system integrator way second, this is an easier approach and will deliver you a working system on which you can start programming far sooner.

If you can't afford the expensive options, then I'd go for as many rack-mounted servers with 4 or 8 cores per box (choice is price dependent, and maybe even 16 cores per box is worth considering today) and, today, I'd be planning for at least 4GB RAM per core. Then you need the fastest interconnect you can afford; GB Ethernet is fine, but Infiniband (or the other one whose name I forget) is finer, though the jump in price is noticeable. And you'll need a PC to act as head node for your new cluster, running the job management system and other stuff. There's a ton of excellent material on the Internet on building and running clusters, often under the heading of Beowulf, which was the name of what is considered to have been the first 'home-brew' cluster.

Now, since you have until February to get your lab up and running, fire 2 of your colleagues and turn their PCs into a mini-Beowulf. Download and install a likely-looking MPI installation (OpenMPI is good but there are others to consider and your o/s might dictate another choice). Now you can start getting ready for when the lab is ready.

PS You don't have to fire 2 people if you can scavenge 2 PCs some other way. And the PCs can be old and inadequate for desktop use they are just going to be a training platform for you and your colleagues (if you have any left). The more nearly identical they are the better.

As said above, OpenMP is certainly the easier way to program as compared to MPI because of incremental parallelization. OpenMP has been used mostly for fine grain parallelism (loop level), while MPI more of a coarse-grain parallelism (domain decomposition). Both are good ways to obtain parallel performance.

We have an OpenMP and an MPI versions of our software (Fortran) and Customers use both depending on their needs.

With the current trends in multi-core architecture, hybrid OpenMP-MPI is another viable approach.