开发者

Correct implementation of Boyer Moore algorithm

开发者 https://www.devze.com 2023-01-07 21:36 出处:网络
I tried to use several implementations, but all of them had bugs. Search at SO gave me http://www-igm.univ-mlv.fr/~lecroq/string/node14.html - looks nice, but this implementation gave me wrong results

I tried to use several implementations, but all of them had bugs.

Search at SO gave me http://www-igm.univ-mlv.fr/~lecroq/string/node14.html - looks nice, but this implementation gave me wrong results - sometimes it doesn't found a string.

I spent couple hours开发者_开发问答 to find the bug.

Following line looks fine:

j += MAX(bmGs[i], bmBc[y[i + j]] - m + 1 + i);

but y is char * and char is signed! It means that y[i + j] could be negative (what happens in one of my tests).

My question is: Where to find correct implementation of Boyer Moore algorithm?


char isn't definitively signed or unsigned - it's unspecified, and left up to the implementation to define.

If the algorithm depends on char being unsigned, then it should explicitly cast the input pointers to unsigned char (which is how the C standard library string handling functions are defined to work - all comparisons are done treating the characters in the string as unsigned char).


Have you tried Wikipedia?

Or the PDF written by the inventors of the algorithm?


As of C++17, this is built into the STL. Just use std::boyer_moore_searcher. For example:

#include <algorithm>
#include <string>
#include <functional>

int main()
{
    std::string haystack = "The quick brown fox jumped over the lazy dog";
    std::string needle = "brown";
    const auto s = std::boyer_moore_searcher<std::string::iterator>(needle.begin(), needle.end());
    const auto it = std::search(haystack.begin(), haystack.end(), s);
    return 0;
}
0

精彩评论

暂无评论...
验证码 换一张
取 消