i'm facing a problem right now, i need to count the number of time a certain MxM matrix appears inside a NxN one (this o开发者_如何转开发ne should be larger than the first one). Any hints on how to do this? I'm going to implement it in C and there is no option to change it.
Revision 1
Hi everyone, i would really like to say thank to all the answers and opinions on the matter. I should tell you that after many hours of hard work we have come to a solutions that is not strictly like the Boyer-Moore approach, but rather an algorithm on my own. I'm planning on publishing it once is tested and finished. The solutions is now being adapted to be paralelized for speed optimization using the university cluster with the C Library MPI.
Hm, sounds like a 2-dimensional version of string matching. I wonder if there's a 2D version of Boyer-Moore?
A Boyer-Moore Approach for Two-Dimensional Matching
Ah, apparently there is. :-)
One approach that immediately came to mind:
Treat the big matrix as a linear string of N*N "characters" and the small matrix as a linear regular expression of length (M+1)*M with "literal characters" in the first M positions of each "row" and the equivalent of .{N-M}
in the remaining position of each row. Compile the regular expression to a DFA and run it.
Time to find all matches is O(N*N). I suspect there are fancier algorithms that would work (adaptations of fancier 1-d substring search algorithms) but this one's not too bad.
Recently, fast algorithms have been developed for building 2D suffix trees. These can be used to find any square submatrix in a larger matrix in time independent of the size of the larger matrix:
- http://www.springerlink.com/content/96511486841752h4/
- Link
You might need to be a subscriber to see those papers.
I also found this freely available one, which describes a randomised construction algorithm that is expected linear-time:
- http://www.cs.nyu.edu/cs/faculty/cole/papers/CH00.ps
I expect that constructing a 2D suffix tree and searching with it would be faster if you need to search for many different submatrices in a single matrix, but it's probably slower than 2D Boyer-Moore if you'll be searching inside a different matrix each time.
精彩评论