I have the following problem:
Program 1 has a huge amount of data, say 10GB. The data in question consists of la开发者_开发技巧rge integer- and double-arrays. Program 2 has 1..n MPI processes that use tiles of this data to compute results.
How can I send the data from program 1 to the MPI Processes?
Using File I/O is out of question. The compute node has sufficient RAM.
It should be possible, depending on your MPI implementation, to run several different programs in the same MPI job. For instance, using OpenMPI you can run
mpirun -n 1 big_program : -n 20 little_program
and you should be able to access both programs using MPI_COMM_WORLD. From there you'd then be able to use the usual MPI functions to pass your data from the big program to the little ones.
One answer might be to have the two programs reside in separate communicators; a single executable could launch both sets of apps by utilizing MPI-2's dynamic process management, and the "producer" program communicate through MPI_COMM_WORLD to the "consumer" application. Subsequently, all IPC for the consumer app would have to run inside a subcommunicator that excluded the producer portion. This would mean rewriting to avoid direct calls to MPI_COMM_WORLD, however.
Based on your description "Program 1" is not an MPI application, and "Program 2" is an MPI application. The shortest path to a solution is likely to open a socket between the two programs and send the data that way. This does not require that "Program 1" be modified to be an MPI Program. I would begin with a socket between "Program 1" and " Program 2 : Rank 0", with Rank 0 distributing the data to the remaining ranks.
Several suggestions so far have involved launching a heterogeneous set of executables as one possible solution. There is no requirement that all the ranks in a single MPI job be the same executable. This requires that both executables be "MPI Programs" (e.g. include at least MPI_Init, and MPI_Finalize calls). The level of modification required to "Program 1", and the inability to run it outside of the MPI environment, may make this option unattractive.
I would recommend that you avoid the "dynamic process" approach, unless you are using a commercial implementation that offers support. Support for connect/accept tends to be spotty in the open source implementations of MPI. It may "just work", but getting technical help if it does not can be an open ended problem.
It is not a good idea to mix sockets & MPI. The easiest way to achieve this is moving both process 1 & process 2 in to a single MPI application.
The best way to implement this is to use a programming model called - MPMD or Multi-Program Multi-Data. As the name implies, your MPI application will have multiple programs operating on multiples of data. Even if Program 1 is not an MPI application, you need not make too many changes. Just call MPI_Init & add the routines for send/recv data. You can think of this as a sort of a Master-Slave model where Prg1 is master and rest are slaves getting pieces of data to work on from the master.
Another method could be to implement a pool of workers by making program 1 same as program 2 and everyone reads a part of the data file and starts working. But you ruled out file IO, so I assume prog2-n do not have access to the file at run time. Master-Slave will work best for your needs.
精彩评论