开发者

parallelize inner loop using openmp

开发者 https://www.devze.com 2023-02-08 12:38 出处:网络
I have three nested loops 开发者_Python百科but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I can

I have three nested loops 开发者_Python百科but only the innermost is parallelizable. The outer and middle loop stop conditions depend on the calculations done by the innermost loop and therefore I cannot change the order.

I have used a OPENMP pragma directive just before the innermost loop but the performance with two threads is worst than with one. I guess it is because the threads are being created every iteration of the outer loops.

Is there any way to create the threads outside the outer loops but just use it in the innermost loop?

Thanks in advance


OpenMP should be using a thread-pool, so you won't be recreating threads every time you execute your loop. Strictly speaking, however, that might depend on the OpenMP implementation you are using (I know the GNU compiler uses a pool). I suggest you look for other common problems, such as false sharing.


Unfortunately, current multicore computer systems are no good for such fine-grained inner-loop parallelism. It's not because of a thread creation/forking issue. As Itjax pointed out, virtually all OpenMP implementations exploit thread pools, i.e., they pre-create a number of threads, and threads are parked. So, there is actually no overhead of creating threads.

However, the problems of such parallelizing inner loops are the following two overhead:

  • Dispatching jobs/tasks to threads: even if we don't need to physically create threads, at least we must assign jobs (= create logical tasks) to threads which mostly requires synchronizations.
  • Joining threads: after all threads in a team, then these threads should be joined (unless nowait OpenMP directive used). This is typically implemented as a barrier operation, which is also very intensive synchronization.

Hence, one should minimize the actual number of thread assigning/joining. You may decrease such overhead by increasing the amount of work of the inner loop per invocation. This could be done by some code changes like loop unrolling.

0

精彩评论

暂无评论...
验证码 换一张
取 消