开发者

Do both alternatives match when using | in Perl regular expressions?

开发者 https://www.devze.com 2023-04-01 13:36 出处:网络
I am confused about the regular expression below. Please help me to understand it. my $test = \"fred andor berry\";

I am confused about the regular expression below. Please help me to understand it.

my $test = "fred andor berry";
if ($test =~ /fred (and|or) berry/) {
    print "Matched!\n";
} else {
      print "Did not match!\n";
}

I thought it would match, but I get "Did not match!". If I add + in it, like this,

my $test = "fred andor berry";
if ($test =~ /fred (and|or)+ berry/) {
   print "Matched!\n";
} else {
   print "Did not match!\n";
}

Then it mat开发者_JAVA技巧ches. I thought I can use and|or to match an expression with "and", "or" and "andor". No?


The part of the regex that is (and|or) means match 'and' or 'or' but not both. When you append the plus to that group it can then match one or more times. For example "fred andandand berry" would also be a valid match for /fred (and|or)+ berry/


While people tend to read a|b as "a or b" the | is not an OR operator; it's the alternation operator. It specifies a set of alternatives for what can match at that point. A more accurate reading would be "either 'a' or 'b' (but not both)".

When you write (and|or)+ you're adding the + quantifier, which means "one or more of the preceding atom." The effect is that instead of matching a single value which could be either "and" or "or", it will match a series of values, each of which could be either "and" or "or". It would match all of the following:

and
or
andor
orand
andorand
andandorororandorandand

If you really want to match just "and", "or", and "andor" (though I don't know why you'd want to) you would write it like this:

(and|or|andor)    # capture
(?:and|or|andor)  # don't capture

depending on whether or not you wanted to capture the specific value matched. (Plain (...) creates a capturing grouping. (?:...) creates a non-capturing grouping.)


The expression (and|or) will match and or or, but not andor. When you add the +, it will accept two (actually one or more) consecutive matches of the same pattern, which allows it to match andor. (First it matches and, then or.)


When matching an atom, it must come immediately after the previous atom.

There are two kinds of "or".

  • Exclusive or
  • Inclusive or

  • If | was an exclusive or, it would match if it finds either "or" or "and" immediately after fred.
  • If | was an inclusive or, it would match if it finds "or", "and" or both immediately after fred.

Both and and or cannot possible be found immediately after fred, so | is obviously an exclusive or.


(and|or)+ means multiple occurences but atleast one of either. so it would also match andand, andorand, orand, ororororand etc.

(and|or) means either and or or. (nice namepicking)

So it would match on

fred and berry

and on

fred or berry

proper documentation if you want to continue with regex can be found at

http://perldoc.perl.org/perlre.html

0

精彩评论

暂无评论...
验证码 换一张
取 消