开发者

Recommendations for a C implementation of a regex parser

开发者 https://www.devze.com 2023-01-29 17:06 出处:网络
I\'m thinking about implementing a regular expression parser in a C library I\'m developing. Now, the question is: is there any open source code that I could use verbatim or with as few changes as pos

I'm thinking about implementing a regular expression parser in a C library I'm developing. Now, the question is: is there any open source code that I could use verbatim or with as few changes as possible? My expectations regarding the code are:

  • it needs to be written in C (not C++)
  • it needs to compile under gcc, mingw, M$VC
  • it mustn't depend on any third party or OS-specific headers/libraries (ie, everything needed to compile it must be readily available with a base installation of gcc, mingw, M$VC
  • it would be nice if it used Perl-compatible regex syntax (like PCRE in PHP).
  • ideally, the code should be as compact as possible

Are there any ready-made solutions that you could recommend? I was looking at PCRE for C and it looks like it has everything that's available in PHP (which rules), but the size (1.4MB DL) is a bit intimidati开发者_开发问答ng. Do you think it's a solid bet? Or are there other options worth considering?

[EDIT]

The library I'm developing is open source, BSD licence.


PCRE is so big because regular expressions are hard. And most of it is documentation and support code anyways; it's much smaller when compiled into object code.


RE2, the Google regexp implementation does a match in linear time (O(n) if n is the length of the string), PCRE and most other regexp engines run in exponential time at worst case. Another noteworthy O(n) regexp matcher is flex, but it needs all possible regexps at compile time. If you are looking for something smaller than PCRE, look at the regexp matcher in busybox, or the pattern matcher in lua.


You might try TRE if you're happy with POSIX regex syntax. If you want Perl syntax, Google has a new implementation worth checking out.


PCRE is pretty much the de facto standard of regex implementations (for a good reason). Don't worry about the size, it's big because regex implementations are complicated. Just use it anyway.

0

精彩评论

暂无评论...
验证码 换一张
取 消