I'm using the following regex to match and capture the string weather in foo bar
:
weather in ([a-z]+|[0-9]{5})\s?([a-zA-Z]+)?
Which will match and capture with bar
being optional, and foo
being able to be a city or a zip.
Howe开发者_开发百科ver, I would love to allow the user to write weather in foo for bar
, since I have accidentally written this a few times myself. Is there any way to optionally capture a literal string like for
without having to resort to \s?f?o?r?\s?
?
Put it in a non-capturing group: (?:\sfor\s)?
To maintain the integrity of the 3 capture groups requires a little more work.
This might be a little advanced, but this is a good example of where assertions are helpfull.
/weather\s+in\s+([[:alpha:]]+|\d{5})\s*((?<=\s)for(?=\s|$)|)\s*((?<=\s)[[:alpha:]]+|)/
Test case in Perl:
use strict;
use warnings;
my @samples = (
'this is the weather in 12345 forever',
'this is the weather in 32156 for ',
'this is the weather in 32156 for today',
'this is the weather in abcdefghijk for',
'this is the weather in abcdefghijk ',
'this is the weather in abcdefghijk end',
);
my $regex = qr/
weather \s+ in \s+ # a literal string with some whitespace's
( # Group 1
[[:alpha:]]+ # City (alpha's), but without spaces
| \d{5} # OR, zip code (5 digits)
) # end group 1
\s* # optional whitespace's
( # Group 2
(?<=\s) # must be a whitespace behind us
for # literal 'for'
(?=\s|$) # in front of us must be a whitespace or string end
| # OR, match NOTHING
) # end group 2
\s* # optional whitespace's
( # Group 3
(?<=\s) # must be a whitespace behind us
[[:alpha:]]+ # 1 or more alpha's
| # OR, match NOTHING
) # end group 3
/x;
for (@samples) {
if (/$regex/x ) {
print "'$1', '$2', '$3'\n";
}
}
Output:
'12345', '', 'forever'
'32156', 'for', ''
'32156', 'for', 'today'
'abcdefghijk', 'for', ''
'abcdefghijk', '', ''
'abcdefghijk', '', 'end'
精彩评论