开发者

How do I substitute overlapping matches with a Perl regex?

开发者 https://www.devze.com 2023-03-16 23:36 出处:网络
I want to find a开发者_StackOverflow社区ll occurences of \"BBB\" in a string and substitute them with \"D\". For example, I have \"ABBBBC\" and want to produce \"ADBC\" and \"ABDC\". (First substitute

I want to find a开发者_StackOverflow社区ll occurences of "BBB" in a string and substitute them with "D". For example, I have "ABBBBC" and want to produce "ADBC" and "ABDC". (First substitute the first BBB, and then substitute the other BBB). Is there a nice way to do this in Perl?

$str = "ABBBBC";
for ( $str =~ m/B(?=BB)/g ) {
    # I match both the BBBs here, but how to substitute the relevant part?
}

I want to get this array: ('ADBC', 'ABDC'), which comes from changing either of the BBBs to a D. The string "ABBBBBC" would give me "ADBBC", "ABDBC" and "ABBDC".


To get overlapping matches, you have to play around with Perl's pos operator.

pos SCALAR
pos
Returns the offset of where the last m//g search left off for the variable in question ($_ is used when the variable is not specified). Note that 0 is a valid match offset. undef indicates that the search position is reset (usually due to match failure, but can also be because no match has yet been run on the scalar).

pos directly accesses the location used by the regexp engine to store the offset, so assigning to pos will change that offset, and so will also influence the \G zero-width assertion in regular expressions. Both of these effects take place for the next match, so you can't affect the position with pos during the current match, such as in (?{pos() = 5}) or s//pos() = 5/e.

Setting pos also resets the matched with zero-length flag, described under Repeated Patterns Matching a Zero-length Substring in perlre.

Because a failed m//gc match doesn't reset the offset, the return from pos won't change either in this case. See perlre and perlop.

For example:

#! /usr/bin/env perl

use strict;
use warnings;

my $str = "ABBBBC";
my @replaced;
while ($str =~ m/^(.*)\G(.+?)BBB(.*)$/g ) {
  push @replaced, $1 . $2 . "D" . $3;
  pos($str) = length($1) + 1;
}

print "[", join("][" => @replaced), "]\n";

Output:

$ ./prog
[ADBC][ABDC]


local our @replaced;
'ABBBBC' =~ /^(.*)BBB(.*)\z(?{ push @replaced, $1.'D'.$2 })(?!)/s;
0

精彩评论

暂无评论...
验证码 换一张
取 消