I'm somewhat new to OpenMP but have experience with parallel processing in general. I worked with boost::threads
before and now I'm testing around with openmp.
The problem is that I don't know how to handle shared data access because I don't really know what openmp does internally with shared data objects inside parallel loops.
What I'm doing right now (this is working so far): I read files from disk into memory with mmap. I receive 开发者_C百科a pointer on char after the memory map part.
OpenMP can now use this pointer inside an OpenMP parallel for loop and share the data between threads. I'm now able to search for regular expression matches inside the mapped and shared file with multiple threads checking every string against a (pretty long) list of regular expressions.
I made this list (a vector containing regex) private inside the openmp loop, so every thread has it's own copy of this list.
Here comes the problem:
To dramatically increase performance of my application I need to be able to remove (regex-)items from this vector once they match a string.
Now all the other active threads need to have this item removed from their list too asap.
So I made this list a shared data object inside the openmp loop but now I get segmentation faults at runtime when I try to write (vector.erase(item#)) to the list.
With boost::threads I would just have used a mutex for locking this object while writing/reading it.
But openmp seems to handle most of the synchronization itself so now I wonder what would be the correct approach to handle this problem when using openmp which is new to me.
For synchronization, you may use #pragma omp critical
or you may use OpenMP lock routines (omp_{init,set,unset,destroy}_lock
).
The benefits of #pragma omp critical
are simplicity and ability to ignore the pragma when the parallel region is known to execute by a single thread. Drawbacks are applicability only to a single parallel region, and global effect within this region: no other thread can execute any other critical section in the region.
OpenMP lock routines are similar to most other available locks, e.g. those of pthreads or Boost (RAII aside). You initialize a lock object, then use it to protect certain critical sections, and destroy when it's unnecessary. These locks can be used to protect access to data from different parallel regions, to build a distributed locking scheme, etc.; but certain amount of overhead is always incurred, and surely usage is more "hairy" comparing to #pragma omp critical
.
However, I would challenge the design of the parallel solution. Erasing an element from middle of the vector makes all iterators invalidated, and moves elements. Erase is supposedly a rare operation (otherwise, the choice of vector would be questionable even in serial code I think), but due to the above effects you have to protect all reads of the vector too, and that will likely be costly. Read/write locks could give some relief, but those are not available in OpenMP, so you would need to use either platform-specific interfaces or a 3rd party library.
I think the following will potentially work better:
- You keep regex vectors private, and add a same-size shared vector of flags that indicate whether a certain regex is still valid or not.
- Before applying a certain regex from a private vector, the code checks in the shared vector if this regex was not "erased" by some other thread. If it was, the regex is skipped.
- After finding a match, the code marks the element of the shared vector that corresponds to the current regex as "erased" so that it will be ignored from now on.
In this scheme, there exist races for reading/writing flags: a flag might be set to "erased" the very next moment it was read as "valid" by another thread. As the result, two different threads may concurrently find a match for the same regex. However this problem I believe exists in your current solution where all regex containers are private, as well as in a solution with a shared container and locks or RW locks, unless a non-RW lock protects also the operation with a given regex. In case multiple matches are a problem, it all should be re-thought.
You can achieve this by creating a critical section.
#pragma omp critical
{
...some synchronized code...
}
EDIT: Removed the part about '#pragma omp atomic' as it cannot atomically perform the operations needed.
精彩评论