开发者

Why does my regex fail when the number ends in 0?

开发者 https://www.devze.com 2022-12-27 00:49 出处:网络
This is a really basic regex question but since I can\'t seem to figure out why the match is failing in certain circumstances I figured I\'d post it to see if anyone else can point out what I\'m missi

This is a really basic regex question but since I can't seem to figure out why the match is failing in certain circumstances I figured I'd post it to see if anyone else can point out what I'm missing.

I'm trying to pull out the 2 sets of digits from strings of the form:

12309123098_102938120938120938
1321312_103810312032123
123123123_10983094854905490
38293827_1293120938129308

I'm using the following code to process each string:

if($string && $string =~ /^(\d)+_(\d)+$/) {
    if(IsInteger($1) && IsInteger($2)) { print "success ('$1','$2')"; }
    else { print "fail"; }
}

Where the IsInterger() function is as follows:

sub IsInteger {
    my $integer = shift;
    if($integer && $integer =~ /^\d+$/) { return 1; }
    return;
}

This function seems to work most of the time but fails on the following for some reason:

1287123437_1268098784380
1287123437_1267589971660

Any ideas on why these fail while others succeed?开发者_StackOverflow社区 Thanks in advance for your help!


This is an add-on to the answers from unicornaddict and ZyX: what are you trying to match?

If you're trying to match the sequences left and right of '_', unicorn addict is correct and your regex needs to be ^(\d+)_(\d+)$. Also, you can get rid of the first qualifier and the 'IsIntrger()` function altogether - you already know it's an integer - it matched (\d+)

if ($string =~ /^(\d+)_(\d+)$/) {
    print "success ('$1','$2')";
} else {
    print "fail\n";
}

If you're trying to match the last digit in each and wondering why it's failing, it's the first check in IsInteger() ( if($intger && ). It's redundant anyway (you know it's an integer) and fails on 0 because, as ZyX notes - it evaluates to false.

Same thing applies though:

if ($string =~ /^(\d)+_(\d)+$/) {
    print "success ('$1','$2')";
} else {
    print "fail\n";
}

This will output success ('8','8') given the input 12309123098_102938120938120938


Because you have 0 at the end of the second string, (\d)+ puts only the last match in the $N variable, string "0" is equivalent to false.


When in doubt, check what your regex is actually capturing.

use strict;
use warnings;

my @data = (
    '1321312_103810312032123',
    '123123123_10983094854905490',
);

for my $s (@data){
    print "\$1=$1 \$2=$2\n" if $s =~ /^(\d)+_(\d)+$/;
    # Output:
    # $1=2 $2=3
    # $1=3 $2=0
}

You probably intended the second of these two approaches.

(\d)+  # Repeat a regex group 1+ times,
       # capturing only the last instance.

(\d+)  # Capture 1+ digits.

In addition, both in your main loop and in IsInteger (which seems unnecessary, given the initial regex in the main loop), you are testing for truth rather than something more specific, such as defined or length. Zero, for example, is a valid integer but false.


Shouldn't + be included in the grouping:

^(\d+)_(\d+)$ instead of ^(\d)+_(\d)+$


Many people have commented on your regex, but the problem you had in your IsInteger (which you really don't need for your example). You checked for "truth" when you really want to check for defined:

sub IsInteger {
    my $integer = shift;
    if( defined $integer && $integer =~ /^\d+$/) { return 1; }
    return;
}

You don't need most of the infrastructure in that subroutine though:

sub IsInteger {
    defined $_[0] && $_[0] =~ /^\d+$/
}
0

精彩评论

暂无评论...
验证码 换一张
取 消