From OpenMP to MPI_问答_开发者_运维开发者技术经验分享

I just wonder how to convert the following openMP program to a MPI program

#include <omp.h>  
#define CHUNKSIZE 100  
#define N     1000  

int main (int argc, char *argv[])    
{  

int i, chunk;  
float a[N], b[N], c[N];  

/* Some initializations */  
for (i=0; i < N; i++)  
  a[i] = b[i] = i * 1.0;  
chunk = CHUNKSIZE;  

#pragma omp parallel shared(a,b,c,chunk) private(i)  
  {  

  #pragma omp for schedule(dynamic,chunk) nowait  
  for (i=0; i < N; i++)  
    c[i] = a[i] + b[i];  

  }  /* end of parallel section */  

return 0;  
}

I have a similar program that I would like to run on a cluster and the program is using OpenMP.

Thanks!

UPDATE:

In the following toy code, I want to limit the parallel part within function f():

#include "mpi.h"  
#include <stdio.h>  
#include <string.h>  

void f();

int main(int argc, char **argv)  
{  
printf("%s\n", "Start running!");  
f();  
printf("%s\n", "End running!");  
return 0;  
}  


void f()  
{  
char idstr[32]; char buff[128];  
int numprocs; int myid; int i;  
MPI_Status stat;  

printf("Entering function f().\n");

MPI_Init(NULL, NULL);  
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);  
MPI_Comm_rank(MPI_COMM_WORLD,&myid);  

if(myid == 0)  
{  
  printf("WE have %d processors\n", numprocs);  
  for(i=1;i<numprocs;i++)  
  {  
    sprintf(buff, "Hello %d", i);  
    MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
    for(i=1;i<numprocs;i++)  
    {  
      MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, &stat);  
      printf("%s\n", buff);  
    }  
}  
else  
{  
  MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, &stat);  
  sprintf(idstr, " Processor %d ", myid);  
  strcat(buff, idstr);  
  strcat(buff, "reporting for duty\n");  
  MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
}  
MPI_Finalize();  

printf("Leaving function f().\n");  
}

However, the running output is not expected. The printf parts before and after the parallel part have been executed by every process, not just the main process:

$ mpirun -np 3 ex2  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
Start running!  
Entering function f().  
WE have 3 processors  
Hello 1 Processor 1 reporting for duty  

Hello 2 Processor 2 reporting for duty  

Leaving function f().  
End running!  
Leaving function f().  
End runnin开发者_开发技巧g!  
Leaving function f().  
End running!

So it seems to me the parallel part is not limited between MPI_Init() and MPI_Finalize().

To answer your update:

When using MPI, the same program is run by each processor. In order to restrict parallel parts you will need to use a statement like:

if (rank == 0) { ...serial work... }

This will ensure that only one processor does the work inside this block.

You can see how this works in the example program you posted, inside f(), there is the if(myid == 0) statement. This block of statements will then only be executed by process 0, all other processes go straight to the else and receive their messages, before sending them back.

With regard to MPI_Init and MPI_Finalize -- MPI_Init initialises the MPI environment. Once you have called this method you can use the other MPI methods like Send and Recv. Once you have finished using MPI methods, MPI_Finalize will free up the resources etc, but the program will keep running. For example, you could call MPI_Finalize before performing some I/O that was going to take a long time. These methods do not delimit the parallel portion of the code, merely where you can use other MPI calls.

Hope this helps.

You just need to assign a portion of the arrays (a, b, c) to each process. Something like this:

#include <mpi.h>

#define N 1000

int main(int argc, char *argv[])
{
  int i, myrank, myfirstindex, mylastindex, procnum;
  float a[N], b[N], c[N];

  MPI_Init(&argc, &argv);
  MPI_Comm_size(MPI_COMM_WORLD, &procnum);
  MPI_Comm_rank(comm, &myrank);


  /* Dynamic assignment of chunks,
   * depending on number of processes
   */
  if (myrank == 0)
    myfirstindex = 0;
  else if (myrank < N % procnum)
    myfirstindex = myrank * (N / procnum + 1);
  else
    myfirstindex = N % procnum + myrank * (N / procnum);

  if (myrank == procnum - 1)
    mylastindex = N - 1;
  else if (myrank < N % procnum)
    mylastindex = myfirstindex + N / procnum + 1;
  else
    mylastindex = myfirstindex + N / procnum;

  // Initializations
  for(i = myfirstindex; i < mylastindex; i++)  
    a[i] = b[i] = i * 1.0; 

  // Computations
  for(i = myfirstindex; i < mylastindex; i++)
    c[i] = a[i] + b[i];

  MPI_Finalize();
}

You can try to use proprietary Intel Cluster OpenMP. It will run OpenMP programs on cluster. Yes, it simulates shared memory computer on distributed memory clusters using the "Software Distributed shared memory" http://en.wikipedia.org/wiki/Distributed_shared_memory

It is easy to use and included in Intel C++ Compiler (9.1+). But it works only on 64-bit processors.