开发者

Extract nth occurrence with Perl Regex

开发者 https://www.devze.com 2023-01-31 02:47 出处:网络
I am trying to find the best way to parse a line that looks like this: Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah

I am trying to find the best way to parse a line that looks like this:

Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah

I just want to extract the stuff between the 6th and 7th vertical bar |

I tried something like

if ($line =~ /^(.*\|){6}(\w*)\|/ ) {  
    print $2;  
}

The problem is that the first part seems to be matching the longest sequence possible because of .*, perhaps there is something different I should be using. Between the vertical bars, there are alphanumeric characters, spaces and punctuat开发者_如何学Goion.

Should I be matching the shortest between them?


You can use .*? instead, to modify the * to prefer fewer to more times.

This could still match in the wrong place if the field you want has non-word characters; to prevent this you can either explicitly say anything-but-| ( ([^|]*\|){6} ) or disable backtracking for that part ( ((?>.*?\|)){6} ).

Or you could just use split:

if ( my $seventh = ( split /\|/, $line, 8 )[6] ) {
    print $seventh;
}

(the 8 is optional and tells split not to bother trying anymore after reaching the 7th |)


Use split. Something like my @fields = split /\|/, $str should work. Then you just index the field you're interested in (also empty fields will be preserved). | must be escaped as it's regexp operator.

0

精彩评论

暂无评论...
验证码 换一张
取 消