I would like to implement a parallel version of the code below using threads in OpenMP,is there any better way to do this?
/* Program to compute Pi using Monte Carlo methods */
#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include <string.h>
#include <time.h>
#define SEED 35791246
int main(int argc, char* argv)
{
int niter=0;
double x,y;
int i,count=0; /* # of points in the 1st quadrant of unit circle */
double z;
double pi;
clock_t end_time, start_time;
printf("Enter the number of iterations used to estimate pi: ");
scanf("%d",&niter);
start_time = clock();
/* initialize random numbers */
srand(SEED);
count=0;
#pragma omp parallel for
for ( i=0; i<niter; i++) {
x = (double)rand()/RAND_MAX;
y = (double)rand()/RAND_MAX;
z = x*x+y*y;
if (z<=1) count++;
}
#pragma omp task
pi=(double)count/niter*4;
#pragma omp barrier
end_ti开发者_StackOverflowme = clock();
printf("# of trials= %d , estimate of pi is %g, time= %f \n",niter,pi, difftime(end_time, start_time));
return 0;
}
It could be improved by correcting some OpenMP bugs. First, since you're summing up (copies of) count
in all of the parallel threads, you need to apply a reduction operator at the end of the parallel segment to combine all of those back into a single value. Also, the variables i
, x
, y
, and z
need to have individual instances for each parallel thread -- you don't want the threads using the same one! To specify all of that, your #pragma
directive at the top of the loop should be:
#pragma omp parallel for private(i, x, y, z) reduction(+:count)
Also, the scope of that is the for
loop, so you don't need to do anything else; there will automatically be a synchronization of the threads after the loop exits. (And you need that synchronization to get count
to contain all the increments from all threads!) In particular, your task
and barrier
pragmas are meaningless, as at that point you are back to just one thread -- and, besides, there's no point in putting that single computation in a parallel task.
And there's the issue that gabe raised about the likely slowness and/or poor randomness of the system random number generator in these cases. You will probably want to investigate the particulars of that on your system, and give it a new random seed in each thread or use a different random-number generator depending on what you find.
Besides that, it looks fairly reasonable. Not much else you can do to that algorithm, as it's short and trivially parallelizable.
The rand
function is not a good idea to use here. Either it is not threadsafe and you will have threads getting duplicate values (not very random), or it will have a lock and the MP version will be slower than the single-thread version.
I would recommend finding another random number generator that can be used simultaneously from multiple threads.
精彩评论