开发者

How to match only once in a regex in Perl

开发者 https://www.devze.com 2023-01-20 20:26 出处:网络
$line = \" TEST: asdas :asd asdasad s\"; if ($line =~ /(.*):(.*)/ { print \"$1= $2 \" } I was expecting TEST =asdas :asd asdasad s
$line = " TEST: asdas :asd asdasad s";

if ($line =~ /(.*):(.*)/
{
  print "$1  = $2 "
}

I was expecting TEST =asdas :asd asdasad s

开发者_如何转开发

But it's not working. What is issue?


The correct way would be:

/([^:]+):(.*)/

or

/(.+?):(.*)/

This way, you're not matching "anything" on the left. You're matching "one or more non-colon characters" in the first example, or "matching the shortest possible string of any characters followed by a colon" in the second.

The even better way is to not use a regex. Use split.

my ($left, $right) = split(/:/, $line, 2);

The ,2 says "I want at most two fields".


The issue is, as said by others, you're matching everything, but the line ending greedily (.*). But they don't tell you that when the regex engine matches everything up to the end of the line, it has to backtrack in order to satisfy the ':' condition.

So after it has swallowed up all the non-linefeed characters, it starts backing up. As it is now going in reverse, the first colon it finds is the ':' right before 'asd'. The colon having been matched, it applies the second group to all non-linefeed characters, which it satisfies.

Whenever you can, you want to avoid backtracking in regexes. Since you want it to match the first colon, everything else before it should not be a colon. So the non-backtracking, determinant expression would be:

([^:]+):(.*)

Once you've seen the first colon, the greedy match is fine. However, if you had a string of spaces and non-spaces and you wanted to match up until the last non-space--thus trimming the string--you can't really specify that in a manner that won't backtrack, because you know whether you want an individual character only as a result of understanding where the character is as a part of the whole.

([^:]+):(.*\S)

When it gets to the end of input, it backtracks for the non-space that it still hasn't matched. And when it finds that, it terminates the capture.

Of course this is a trivial example, and alternative expressions can reduce backtracking. You might know that only single space characters will be accepted, so you can craft an expression that will at most backtrack once, but only to conclude the match:

([^:]+):((?:\S| \S)+)

Here it looks at the next character: if it's not a space, no problem; if it is a space, then only one more character needs to be read in order to determine whether it's a keeper. And as the space-with-following-non-space is the last option, it fails and completes the match.

This post from Regex Guru has a little more on this.


Two problems:

  1. was that you needed a closing parenthesis, ), at the end of your if statement
  2. you want a non-greedy expression to match the least amount before the first colon (:)

Try $line =~ m/(.*?):(.*)/ - note the .*? - this means match the minimum required. Normally .* means match the maximum possible.


Making the first .* non-greedy will also work:

if ($line =~ /(.*?):(.*)/) {
  print "$1  = $2 "
}


$line = " TEST: asdas :asd asdasad s";

if ($line =~ /(.*?):(.*)/)
{
    print "$1  = $2 "
}

Use the above instead. Here (.*?) means non-greedy matching. So it will match till it finds the first ':'.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号