OpenMP dynamic loop decomposition in chunks_问答_开发者

OpenMP dynamic loop decomposition in chunks

开发者 https://www.devze.com 2023-02-24 00:07 出处：网络

I am using OpenMP to go through a large loop in parallel. Let\'s say the array I\'m working on has N entries in total. I would like one thread to do the first N/2 entries and the other thread the last

I am using OpenMP to go through a large loop in parallel. Let's say the array I'm working on has N entries in total. I would like one thread to do the first N/2 entries and the other thread the last N/2.

I have to avoid that the threads work on entries that are next to each other. The size N is always much bigger than the number of threads, so I don't need to worry about locks if I can get OpenMP to distribute the work the way I outlined ab开发者_运维问答ove.

If the size N is known at compiletime, I can use #pragma omp parallel for schedule(static,N/2). Unfortunately it isn't. So, how do I define the chunk size dynamically?

There's no problem as long as N is known at runtime; I'm not sure why you think it has to be known at compile time. OMP loop constructs would be of very limited use indeed if everything had to be known at compile time.

#include <stdio.h>
#include <stdlib.h>
#include <omp.h>

int main(int argc, char **argv) {
    int n;
    int chunksize;

    if (argc != 2) {
        fprintf(stderr,"Usage: %s n, where n = number of iterations.\n", argv[0]);
        exit(-1);
    }
    n = atoi(argv[1]);
    if (n<1 || n>200) n = 10;

    chunksize = n/2;

    #pragma omp parallel num_threads(2) default(none) shared(n,chunksize)
    {
        int nthread = omp_get_thread_num();
        #pragma omp for schedule(static,chunksize) 
        for (int i=0; i<n; i++) {
            printf("Iter %d being done by thread %d\n", i, nthread);
        }
    }

    return 0;
}

And it runs simply enough, as so:

$ gcc -v
[...]
gcc version 4.4.0 (GCC) 

$ gcc -o loop loop.c -fopenmp

$ ./loop 10
Iter 5 being done by thread 1
Iter 6 being done by thread 1
Iter 7 being done by thread 1
Iter 8 being done by thread 1
Iter 9 being done by thread 1
Iter 0 being done by thread 0
Iter 1 being done by thread 0
Iter 2 being done by thread 0
Iter 3 being done by thread 0
Iter 4 being done by thread 0

If you don't want to use builtin openmp scheduling options as @Jonathan Dursi's answer shows then you could implement required options yourself:

#include <stdio.h>
#include <omp.h>
/* $ gcc -O3 -fopenmp -Wall *.c && ./a.out  */

static void doloop(int n) {
  int thread_num, num_threads, start, end, i;
#pragma omp parallel private(i,thread_num,num_threads,start,end)
  {
    thread_num = omp_get_thread_num();
    num_threads = omp_get_num_threads();
    start = thread_num * n / num_threads;
    end = (thread_num + 1) * n / num_threads;

    for (i = start; i != end; ++i) {
      printf("%d %d\n", thread_num, i);
    }
  }
}

int main() {
  omp_set_num_threads(2);
  doloop(10);
  return 0;
}

Output

I had a similar problem on dotNET, and ended up writing a smart queue object that would return a dozen objects at a time, once they are available. Once I have a package in hand, I'd decide on a thread that can process all of them in one go.

When working on this problem, I kept in mind that W-queues are better than M-queues. It's better to have one long line with multiple workers, than to have a line for each worker.