开发者

Perl: Multiple global "or"-separated regex conditions in while block leads to an infinite loop?

开发者 https://www.devze.com 2023-01-05 06:27 出处:网络
I\'m learning Perl and noticed a rather peculiar quirk -- attempting to match one of multiple regex conditions in a while loop results in that loop going on for i开发者_运维百科nfinity:

I'm learning Perl and noticed a rather peculiar quirk -- attempting to match one of multiple regex conditions in a while loop results in that loop going on for i开发者_运维百科nfinity:

#!/usr/bin/perl

my $hivar = "this or that";

while ($hivar =~ m/this/ig || $hivar =~ m/that/ig) {
        print "$&\n";
}

The output of this program is:

this
that
that
that
that
[...]

I'm wondering why this is? Are there any workarounds that are less clumsy than this:

#!/usr/bin/perl

my $hivar = "this or that";

while ($hivar =~ m/this|that/ig) {
        print "$&\n";
}

This is a simplification of a real-world problem I am encountering, and while I am interested in this in a practical standpoint, I also would like to know what behind-the-scenes is triggering this behavior. This is a question that doesn't seem to be very Google-compatible.

Thanks!

Tom


The thing is that there's a hidden value associated with each string, not with each match, that controls where a /g match will attempt to continue, and accessible through pos($string). What happens is:

  1. pos($hivar) is 0, /this/ matches at position 0 and resets pos($hivar) to 4. The second match isn't attempted because the or operator is already true. $& becomes "this" and gets printed.
  2. pos($hivar) is 4, /this/ fails to match because there's no "this" at position 4 or beyond. The failing match resets pos($hivar) to 0.
  3. /that/ matches at position 6 and resets pos($hivar) to 10. $& becomes "that" and gets printed.
  4. pos($hivar) is 10, /this/ fails to match because there's no "this" at position 10 or beyond. The failing match resets pos($hivar) to 0.
  5. /that/ matches at position 6 and resets pos($hivar) to 10. $& becomes "that" and gets printed.

and steps 4 and 5 repeat indefinitely.

Adding the c regex flag (which tells the engine not to reset pos on a failed match) solves the problem in the example code you provided, but it might or might not be the ideal solution to a more complex problem.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号