开发者

Optionally match a literal string

开发者 https://www.devze.com 2023-02-05 14:43 出处:网络
I\'m using the following regex to match and capture the string weather in foo bar: weather in ([a-z]+|[0-9]{5})\\s?([a-zA-Z]+)?

I'm using the following regex to match and capture the string weather in foo bar:

weather in ([a-z]+|[0-9]{5})\s?([a-zA-Z]+)?

Which will match and capture with bar being optional, and foo being able to be a city or a zip.

Howe开发者_开发百科ver, I would love to allow the user to write weather in foo for bar, since I have accidentally written this a few times myself. Is there any way to optionally capture a literal string like for without having to resort to \s?f?o?r?\s??


Put it in a non-capturing group: (?:\sfor\s)?


To maintain the integrity of the 3 capture groups requires a little more work.
This might be a little advanced, but this is a good example of where assertions are helpfull.

/weather\s+in\s+([[:alpha:]]+|\d{5})\s*((?<=\s)for(?=\s|$)|)\s*((?<=\s)[[:alpha:]]+|)/

Test case in Perl:

use strict;
use warnings;

my @samples = (
 'this is  the weather in 12345 forever',
 'this is  the weather in 32156 for ',
 'this is  the weather in 32156 for today',
 'this is  the weather in abcdefghijk for',
 'this is  the weather in abcdefghijk ',
 'this is  the weather in abcdefghijk end',
);

my $regex = qr/
  weather \s+ in \s+    # a literal string with some whitespace's
   (                    # Group 1
       [[:alpha:]]+        # City (alpha's), but without spaces
     | \d{5}               # OR, zip code (5 digits)
   )                    # end group 1
   \s*                  # optional whitespace's
   (                    # Group 2
       (?<=\s)             # must be a whitespace behind us
       for                 # literal 'for'
       (?=\s|$)            # in front of us must be a whitespace or string end 
     |                     # OR, match NOTHING
   )                    # end group 2
   \s*                  # optional whitespace's
   (                    # Group 3
       (?<=\s)             # must be a whitespace behind us
       [[:alpha:]]+        # 1 or more alpha's
     |                     # OR, match NOTHING
   )                    # end group 3
 /x;

for (@samples) {
    if (/$regex/x ) {
        print "'$1',  '$2',  '$3'\n";
    }
}

Output:

'12345', '', 'forever'
'32156', 'for', ''
'32156', 'for', 'today'
'abcdefghijk', 'for', ''
'abcdefghijk', '', ''
'abcdefghijk', '', 'end'

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号