This may sound a little odd, but it would be extremely useful to me. Are there any regex implementations (any language but preferably java, javascript, c, c++) that use an event based model for matches?
I would like to be able to register a bunch of different regular expressions I am looking for in a string via an event based model, feed the string though the regex engine, 开发者_运维知识库and just have the events fired off correctly. Does anything like this exist?
I realize this is bordering on the territory of a heavy duty lexer/parser, but I would prefer to stay away from that if at all possible, as my search expressions would need to be dynamic (completely).
Thanks
This is very easy to do in Perl regular expressions. All you do is insert your event callouts at the appropriate point in the pattern in the most straightforward manner imaginable.
First, imagine a pattern for pulling out decimal numbers from string:
my $rx0 = /[+-]?(?:\d+(?:\.\d*)?|\.\d+)/;
Let’s expand that out so we can insert our callouts:
my $rx1 = qr{
[+-] ?
(?: \d+
(?: \. \d* ) ?
|
\. \d+
)
}x;
For callouts, I’ll just print some debugging, but you could do anything you want:
my $rx2 = qr{
(?: [+-] (?{ say "\tleading sign" })
) ?
(?: \d+ (?{ say "\tinteger part" })
(?: \. (?{ say "\tinternal decimal point" })
\d* (?{ say "\toptional fractional part" })
) ?
|
\. (?{ say "\tleading decimal point" })
\d+ (?{ say "\trequired fractional part" })
) (?{ say "\tsuccess" })
}x;
Here’s the whole demo:
use 5.010;
use strict;
use utf8;
my $rx0 = qr/[+-]?(?:\d+(?:\.\d*)?|\.\d+)/;
my $rx1 = qr{
[+-] ?
(?: \d+
(?: \. \d* ) ?
|
\. \d+
)
}x;
my $rx2 = qr{
(?: [+-] (?{ say "\tleading sign" })
) ?
(?: \d+ (?{ say "\tinteger part" })
(?: \. (?{ say "\tinternal decimal point" })
\d* (?{ say "\toptional fractional part" })
) ?
|
\. (?{ say "\tleading decimal point" })
\d+ (?{ say "\trequired fractional part" })
) (?{ say "\tsuccess" })
}x;
my $string = <<'END_OF_STRING';
The Earth’s temperature varies between
-89.2°C and 57.8°C, with a mean of 14°C.
There are .25 quarts in 1 gallon.
+10°F is -12.2°C.
END_OF_STRING
while ($string =~ /$rx2/gp) {
printf "Number: ${^MATCH}\n";
}
which when run produces this:
leading sign
integer part
internal decimal point
optional fractional part
success
Number: -89.2
integer part
internal decimal point
optional fractional part
success
Number: 57.8
integer part
success
Number: 14
leading decimal point
leading decimal point
required fractional part
success
Number: .25
integer part
success
Number: 1
leading decimal point
leading sign
integer part
success
Number: +10
leading sign
integer part
internal decimal point
optional fractional part
success
Number: -12.2
leading decimal point
You may want to arrange a more grammatical regular expression for maintainability. This also helps for when you want to make a recursive descent parser out of it. (Yes, of course you can do that: this is Perl, after all. :)
Look at the last solution in this answer for what I mean by grammatical regexes. I also have larger examples elsewhere here on SO.
But it sounds like you should look at the Regexp::Grammars
module by Damian Conway, which was built for just this sort of thing. This question talks about it, and has a link to the module proper.
You might want to check out PIRE - a very fast automata-based regexp engine, tuned to match zillions of lines of text against many regular expressions quickly. It's available in C and has some bindings.
It's really not something that's too hard to put together yourself if you can't find any existing library.
Something like this:
public class RegexNotifier {
private final Map<Pattern, List<RegexListener>> listeners = new HashMap<Pattern, List<RegexListener>>();
public synchronized void register(Pattern pattern, RegexListener listener) {
List<RegexListener> list = listeners.get(pattern);
if (list == null) {
list = new ArrayList<RegexListener>();
listeners.put(pattern, list);
}
list.add(listener);
}
public void process(String input) {
for (Entry<Pattern, List<RegexListener>> entry : listeners.entrySet()) {
if (entry.getKey().matcher(input).matches()) {
for (RegexListener listener : entry.getValue()) {
listener.stringMatched(input, entry.getKey());
}
}
}
}
}
interface RegexListener {
public void stringMatched(String matched, Pattern pattern);
}
The only shortcoming I see with this is that Pattern doesn't implement hashCode()
and equals()
, meaning it will be less than optimal if equal patterns using different instances are used. But that usually doesn't happen because the factory method Pattern.compile()
is good about caching patterns.
精彩评论