I'm using a version of openMP which does not support reduce() for complex argument. I need a fast dot-product function like
std::complex< double > dot_prod( std::complex< double > *v1,std::complex< double > *v2,int dim )
{
std::complex< double > sum=0.;
int i;
# pragma omp parallel shared(sum)
# pragma omp for
for (i=0开发者_如何学C; i<dim;i++ )
{
#pragma omp critical
{
sum+=std::conj<double>(v1[i])*v2[i];
}
}
return sum;
}
Obviously this code does not speed up the problem but slows it down. Do you have a fast solution without using reduce() for complex arguments?
Each thread can calculate the private sum as the first step and as the second step it can be composed to the final sum. In that case the critical section is only needed in the final step.
std::complex< double > dot_prod( std::complex< double > *v1,std::complex< double > *v2,int dim )
{
std::complex< double > sum=0.;
int i;
# pragma omp parallel shared(sum)
{
std::complex< double > priv_sum = 0.;
# pragma omp for
for (i=0; i<dim;i++ )
{
priv_sum += std::conj<double>(v1[i])*v2[i];
}
#pragma omp critical
{
sum += priv_sum;
}
}
return sum;
}
Try doing the multiplications in parallel, then sum them serially:
template <typename T>
std::complex<T> dot_prod(std::complex<T> *a, std::complex<T> *b, size_t dim)
{
std::vector<std::complex<T> > prod(dim); // or boost::scoped_array + new[]
#pragma omp parallel for
for (size_t i=0; i<dim; i++)
// I believe you had these reversed
prod[i] = a[i] * std::conj(b[i]);
std::complex<T> sum(0);
for (size_t i=0; i<dim; i++)
sum += prod[i];
return sum;
}
This does require O(dim) working memory, of course.
Why not have the N threads compute N individual sums. Then at the end you only need to sum the N sums, which can be done serially, as N is quite small. Although I don't know how to accomplish that with OpenMP, at the moment (I don't have any experience with it), I'm quite sure this is easily achievable.
精彩评论