开发者

Perl pattern match variable question

开发者 https://www.devze.com 2023-02-02 22:42 出处:网络
I\'m trying to open a file, match a particular line, and then wrap HTML tags around that line. Seems terribly simple but apparently I\'m missing something and don\'t understand the Perl matched patter

I'm trying to open a file, match a particular line, and then wrap HTML tags around that line. Seems terribly simple but apparently I'm missing something and don't understand the Perl matched pattern variables correctly.

I'm matching the line with this:

$line =~ m/(Number of items:.*)/i;

Which puts the entire line into $1. I try to then print开发者_JS百科 out my new line like this:

print "<p>" . $1 . "<\/p>;

I expect it to print this:

<p>Number of items: 22</p>

However, I'm actually getting this:

</p>umber of items: 22

I've tried all kinds of variations - printing each bit on a separate line, setting $1 to a new variable, using $+ and $&, etc. and I always get the same result.

What am I missing?


You have an \r in your match, which when printed results in the malformed output.

edit: To explain further, chances are your file has windows style \r\n line endings. chomp won't remove the \r, which will then get slurped into your greedy match, and results in the unpleasant output (\r means go back to the start of the line and continue printing).

You can remove the \r by adding something like

$line =~ tr/\015//d;


Can you provide a complete code snippet that demonstrates your problem? I'm not seeing it.

One thing to be cautious of is that $1 and friends refer to captures from the last successful match in that dynamic scope. You should always verify that a match succeeds before using one:

$line = "Foo Number of items: 97\n";
if ( $line =~ m/(Number of items:.*)/i ) {
    print "<p>" . $1 . "<\/p>\n";
}


You've just learned (for future reference) how dangerous .* can be.

Having banged my head against similar unpleasantnesses, these days I like to be as precise as I can about what I expect to capture. Maybe

$line =~ m/(Number of items:\s+\d+)/;

Then I'm sure of not capturing the offending control character in the first place. Whatever Cygwin may be doing with Windows files, I can remain blissfully ignorant.

0

精彩评论

暂无评论...
验证码 换一张
取 消