开发者

What Perl regex can match the contiguous subset of digits 12345?

开发者 https://www.devze.com 2023-01-21 12:24 出处:网络
I would like a Perl regex that matches any contiguous subset of the string \'12345\'. I\'m probably just having a brain-freeze, but this is my test code and current best regex.I can see how to brute-

I would like a Perl regex that matches any contiguous subset of the string '12345'.

I'm probably just having a brain-freeze, but this is my test code and current best regex. I can see how to brute-force the situation by adding alternatives, but 开发者_StackOverflow社区I'm wondering which elegant alternative I'm missing. [I don't specifically need captures for the digits; I have left the sample regex without non-capturing parentheses to make it slightly less cluttered.]

Test case:

use strict;
use warnings;

my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad  = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
               35 21 32 43 54 543 5432 54321);

my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/;   # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/;   # 6 'good' failures.
my $fail = 0;

foreach my $opt (@good)
{
    printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}

foreach my $opt (@bad)
{
    printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}

print(($fail == 0) ? "PASS\n" : "FAIL\n");

Sample outputs:

Case 1 (commented out):

GOOD 1: 3 - missed by regex
GOOD 2: 34 - missed by regex
GOOD 3: 345 - missed by regex
GOOD 4: 4 - missed by regex
GOOD 5: 45 - missed by regex
GOOD 6: 5 - missed by regex
FAIL

Case 2 (active):

GOOD 1: 4 - missed by regex
GOOD 2: 45 - missed by regex
GOOD 3: 5 - missed by regex
BAD 4: 13 - allowed by regex
BAD 5: 134 - allowed by regex
BAD 6: 1345 - allowed by regex
FAIL

So, can you write a nice simple, symmetric regex that matches what I want and not what I don't?


This regex lets the test case pass, but isn't as elegant as I was hoping for:

my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/;

Test case with Justin's solution

use strict;
use warnings;

my @good = qw( 1 12 123 1234 12345 2 23 234 2345 3 34 345 4 45 5);
my @bad  = qw( 0 6 13 134 1345 145 15 124 1245 125 1235 24 245 25
               35 21 32 43 54 543 5432 54321 11 122 1233 1223 12234);

#my $qr = qr/^(1?(2?(3(4(5)?)?)?)?)$/;   # 3 'good', 3 'bad' failures
#my $qr = qr/^(1?(2(3(4(5)?)?)?)?)$/;    # 6 'good' failures.
#my $qr = qr/^((1?(2(3(4(5)?)?)?)?)|(3?(4(5)?)?)|5)$/;  # Passes

# Ysth's solution - passes
#my $qr = qr/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*$/;

my $fail = 0;

foreach my $opt (@good)
{
    printf "GOOD %d: $opt - missed by regex\n", ++$fail if ('12345' !~ /$opt/);
    #printf "GOOD %d: $opt - missed by regex\n", ++$fail if ($opt !~ /$qr/);
}

foreach my $opt (@bad)
{
    printf "BAD %d: $opt - allowed by regex\n", ++$fail if ('12345' =~ /$opt/);
    #printf "BAD %d: $opt - allowed by regex\n", ++$fail if ($opt =~ /$qr/);
}

print(($fail == 0) ? "PASS\n" : "FAIL\n");


Reverse the match:

'12345' =~ /$opt/


Here's a revised version of Justin's idea:

index('12345', $opt) >= 0;

Or, if you need to exclude the empty string

index('12345', $opt) >= 0 and length $opt;

This way, you don't need to check $opt for regex metachars. I'm not sure which version woud be faster.


/^[12345](?:(?<=1)2|(?<=2)3|(?<=3)4|(?<=4)5)*\z/

Sorry, got it wrong twice. This should do it. The explicit list of all possible matches is going to be faster, though.


Do you want something better than

/\b(1|12|123|1234|12345|2|23|234|2345|3|34|345|4|45|5)\b/

?

0

精彩评论

暂无评论...
验证码 换一张
取 消