开发者

Why does a positive lookahead lead to captures in my Perl regex?

开发者 https://www.devze.com 2022-12-24 03:19 出处:网络
I can\'t get why this code work: $seq = \'GAGAGAGA\'; my $regexp = \'(?=((G[UCGA][GA]A)|(U[GA]CG)|(CUUG)))\'; # zero width match

I can't get why this code work:

$seq = 'GAGAGAGA';
my $regexp = '(?=((G[UCGA][GA]A)|(U[GA]CG)|(CUUG)))'; # zero width match
while ($seq =~ /$regexp/g){ # globally
    开发者_运维问答 my $pos = pos($seq) + 1; # position of a zero width matching
     print "$1 position $pos\n";
}

I know this is a zero width match and it dosn't put the matched string in $&, but why does it put it in $1?

thank you!


Matches are captured in $1 because of all the internal parentheses. If you don't want capturing, then use

my $regexp = '(?=(?:(?:G[UCGA][GA]A)|(?:U[GA]CG)|(?:CUUG)))';

or even better

my $regexp = qr/(?=(?:(?:G[UCGA][GA]A)|(?:U[GA]CG)|(?:CUUG)))/;

From the perlre documentation:

  • (?:pattern)
  • (?imsx-imsx:pattern)

This is for clustering, not capturing; it groups subexpressions like (), but doesn't make backreferences as () does. So

@fields = split(/\b(?:a|b|c)\b/)

is like

@fields = split(/\b(a|b|c)\b/)

but doesn't spit out extra fields. It's also cheaper not to capture characters if you don't need to.

Any letters between ? and : act as flags modifiers as with (?imsx-imsx). For example,

/(?s-i:more.*than).*million/i

is equivalent to the more verbose

/(?:(?s-i)more.*than).*million/i


Your regular expression contains a capture (...) which means the $1, $2, etc. variables will be populated with the results of those captures. This works in lookahead assertions too (although not lookbehind assertions, I believe).

As with all captures, if you rewrite as (?:...) then the contents will not go into a capture variable.

0

精彩评论

暂无评论...
验证码 换一张
取 消