I am trying to find the best way to parse a line that looks like this:
Explicit|00|11|Hello World|12 3 134||and|blah|blah|blah
I just want to extract the stuff between the 6th and 7th vertical bar |
I tried something likeif ($line =~ /^(.*\|){6}(\w*)\|/ ) {
print $2;
}
The problem is that the first part seems to be matching the longest sequence possible because of .*
, perhaps there is something different I should be using. Between the vertical bars, there are alphanumeric characters, spaces and punctuat开发者_如何学Goion.
Should I be matching the shortest between them?
You can use .*?
instead, to modify the *
to prefer fewer to more times.
This could still match in the wrong place if the field you want has non-word characters; to prevent this you can either explicitly say anything-but-| ( ([^|]*\|){6}
) or disable backtracking for that part ( ((?>.*?\|)){6}
).
Or you could just use split:
if ( my $seventh = ( split /\|/, $line, 8 )[6] ) {
print $seventh;
}
(the 8 is optional and tells split not to bother trying anymore after reaching the 7th |)
Use split. Something like my @fields = split /\|/, $str
should work. Then you just index the field you're interested in (also empty fields will be preserved). | must be escaped as it's regexp operator.
精彩评论