Bug in Perl or I don't understand something about regexp matching and perl variables?_问答_开发者

Bug in Perl or I don't understand something about regexp matching and perl variables?

开发者 https://www.devze.com 2023-03-13 21:09 出处：网络

For long time, I was always thinking that parameters in Perl subs are passed by value. Now, I hit something that I don\'t understand:

相关专题：perl

For long time, I was always thinking that parameters in Perl subs are passed by value. Now, I hit something that I don't understand:

use strict;
use warnings;

use Data::Dumper;

sub p {
    print STDERR "Before match: " . Data::Dumper->Dump([[@_]]) . "\n";
    "1" =~ /1/;
    print STDERR "After  match: " . Data::Dumper->Dump([[@_]]) . "\n";
}

my $line = "joj开发者_JAVA技巧o.tsv.bz2";

if ($line =~ /\.([a-z0-9]+)(?:\.(bz2|gz|7z|zip))?$/i) {
    p($1, $2 || 'none');
    p([$1, $2 || 'none']);
}

On first invocation of p(), and after executing of regexp match, values in @_ will become undefs. On the second invocation, everything is OK (values passed as array ref are not affected).

This was tested with Perl versions 5.8.8 (CentOS 5.6) and 5.12.3 (Fedora 14).

The question is - how this could happen, that regexp match destroys content of @_, which was built using $1, $2 etc (other values, if you add them, are not affected)?

The perlsub man page says:

The array @_ is a local array, but its elements are aliases for the actual scalar parameters.

So when you pass $1 to a subroutine, inside that subroutine $_[0] is an alias for $1, not a copy of $1. Therefore it gets modified by the regexp match in your p.

In general, the start of every Perl subroutine should look something like this:

my @args = @_;

...or this:

my ($arg1, $arg2) = @_;

...or this:

my $arg = shift;

And any capturing regexp should be used like this:

my ($match1, $match2) = $str =~ /my(funky)(regexp)/;

Without such disciplines, you are likely to be driven mad by subtle bugs.

As suggested, copying the args in every sub is a good idea (if only to document what they are by giving them a non-punctuation name).

However, it's also a good idea to never pass global variables; pass "$1", "$2", not $1, $2. (This applies to things like $DBI::errstr too.)

I am not quite sure why this happens but I would say you should use something like my $arg1 = shift; my $arg2 = shift; and use $arg1 and $arg2 in your sub.

Using the perl debugger you will see that @_ looks different in the 2 sub calls:

1st call: Before match:

x @_
0  'tsv'
1  'bz2'

After match:

x @_
0  undef
1  undef

I think this was overwritten by the match.

2nd call: Before match:

x @_
0  ARRAY(0xc2b6e0)
    0  'tsv'
    1  'bz2'

After match:

x @_
0  ARRAY(0xc2b6e0)
    0  'tsv'
    1  'bz2'

So maybe this wasn't overwritten because of the different structure(?).

Hope this helps a little.

Bug in Perl or I don't understand something about regexp matching and perl variables?

精彩评论

关注公众号

热门标签

图文推荐

Bug in Perl or I don't understand something about regexp matching and perl variables?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：