开发者

dot product of complex vectors with openMP

开发者 https://www.devze.com 2023-03-10 22:04 出处:网络
I\'m using a version of openMP which does not support reduce() for complex argument. I need a fast dot-product function like

I'm using a version of openMP which does not support reduce() for complex argument. I need a fast dot-product function like

std::complex< double > dot_prod( std::complex< double > *v1,std::complex< double > *v2,int dim )

{
    std::complex< double > sum=0.;
    int i;
# pragma omp parallel shared(sum)
# pragma omp for
    for (i=0开发者_如何学C; i<dim;i++ )
    {
#pragma omp critical
        {
            sum+=std::conj<double>(v1[i])*v2[i];
        }
    }
    return sum;
}

Obviously this code does not speed up the problem but slows it down. Do you have a fast solution without using reduce() for complex arguments?


Each thread can calculate the private sum as the first step and as the second step it can be composed to the final sum. In that case the critical section is only needed in the final step.

std::complex< double > dot_prod( std::complex< double > *v1,std::complex< double > *v2,int dim )
{
  std::complex< double > sum=0.;
  int i;
  # pragma omp parallel shared(sum)
  {
    std::complex< double > priv_sum = 0.;
    # pragma omp for
    for (i=0; i<dim;i++ )
    {
      priv_sum += std::conj<double>(v1[i])*v2[i];
    }

    #pragma omp critical
    {
      sum += priv_sum;
    }
  }
  return sum;
}


Try doing the multiplications in parallel, then sum them serially:

template <typename T>
std::complex<T> dot_prod(std::complex<T> *a, std::complex<T> *b, size_t dim)
{
    std::vector<std::complex<T> > prod(dim);  // or boost::scoped_array + new[]

    #pragma omp parallel for
    for (size_t i=0; i<dim; i++)
        // I believe you had these reversed
        prod[i] = a[i] * std::conj(b[i]);

    std::complex<T> sum(0);
    for (size_t i=0; i<dim; i++)
        sum += prod[i];

    return sum;
}

This does require O(dim) working memory, of course.


Why not have the N threads compute N individual sums. Then at the end you only need to sum the N sums, which can be done serially, as N is quite small. Although I don't know how to accomplish that with OpenMP, at the moment (I don't have any experience with it), I'm quite sure this is easily achievable.

0

精彩评论

暂无评论...
验证码 换一张
取 消