开发者

Why can't I match a substring which may appear 0 or 1 time using /(subpattern)?/

开发者 https://www.devze.com 2023-02-14 18:54 出处:网络
The original string is like this: checksessionok:6178 avg:479 avgnet:480MaxTime:18081fail1:19 The last part \"fail1:19\" may appear 0 or 1 time. And I tried to match the number after \"fail1:\", whi

The original string is like this:

checksession ok:6178 avg:479 avgnet:480 MaxTime:18081 fail1:19

The last part "fail1:19" may appear 0 or 1 time. And I tried to match the number after "fail1:", which is 19, using this:

($reg_suc, $reg_fail) = ($1, $2) if $line =~ /^checksession\s+ok:(\d+).*(fail1:(\d+))?/;

It do开发者_如何学运维esn't work. The $2 variable is empty even if the "fail1:19" does exist. If I delete the "?", it can match only if the "fail1:19" part exists. The $2 variable will be "fail1:19". But if the "fail1:19" part doesn't exist, $1 and $2 neither match. This is incorrect.

How can I rewrite this pattern to capture the 2 number correctly? That means when the "fail1:19" part exist, two numbers will be recorded, and when it doesn't exit, only the number after "ok:" will be recorded.


First, the number in fail field would end in $3, as those variables are filled according to opening parentheses. Second, as codaddict shows, the .* construct in RE is hungry, so it will eat even the fail... part. Third, you can avoid numbered variables like this:

my $line = "checksession ok:6178 avg:479 avgnet:480 MaxTime:18081 fail1:19";
if(my ($reg_suc, $reg_fail, $addend)
    = $line =~ /^checksession\s+ok:(\d+).*?(fail1:(\d+))?$/
) {
    warn "$reg_suc\n$reg_fail\n$addend\n";
}


Try the regex:

^checksession\s+ok:(\d+).*?(fail1:(\d+))?$

Ideone Link

Changes made:

  • .* in the middle has been made non-greedy and
  • $ (end anchor) has been added.

As a result of above changes .*? will try to consume as little as possible and the end anchor forces the regex to match till the end of the string, matching fail1:number if present.


I think this is one of the few cases where a split is actually more robust than a regex:

$bar[0]="checksession ok:6178 avg:479 avgnet:480 MaxTime:18081 fail1:19";
$bar[1]="checksession ok:6178 avg:479 avgnet:480 MaxTime:18081";
for $line (@bar){
    (@fields) = split/ /,$line;
    $reg_suc = $fields[1];
    $reg_fail = $fields[5];
    print "$reg_suc $reg_fail\n";
}


I try to avoid the non-greedy modifier. It often bites back. Kudos for suggesting split, but I'd go a step further:

my %rec = split /\s+|:/, ( $line =~ /^checksession (.*)/ )[0];
print "$rec{ok} $rec{fail1}\n";
0

精彩评论

暂无评论...
验证码 换一张
取 消