I'm running a completely parallel matrix multiplication program on a Mac Pro with a Xeon processor. I create 8 threads (as many threads as cores), and there are no shared writing issues (no writing to the same locations). For some reason, my use of pthread_create
and pthread_join
is about twice as slow as using #pragma openmp
.
There are no other differences in anything... same compile options, same number of threads in both cases, same code (except the pragma/pthread
portions obviously), etc.
And the loops are very big -- I'm not parallelizing small loops.
(I can't really post the code because it's school work.)
Why might this be happening? Doesn't OpenMP use POSIX threads itself? How can it be faster开发者_如何学Go?
(edited) What is your main thread doing? Without seeing your code, I was guessing that the main thread is actually barely running, but still eating up clock-cycles while the pthreads finish, then it starts again and continues. Each time its given cycles there is overhead to pausing/continuing the other threads.
In OpenMP, the main thread probably goes to sleep, and waits for a wake-up event when the parallel regions finish.
精彩评论