For long time, I was always thinking that parameters in Perl subs are passed by value. Now, I hit something that I don't understand:
use strict;
use warnings;
use Data::Dumper;
sub p {
print STDERR "Before match: " . Data::Dumper->Dump([[@_]]) . "\n";
"1" =~ /1/;
print STDERR "After match: " . Data::Dumper->Dump([[@_]]) . "\n";
}
my $line = "joj开发者_JAVA技巧o.tsv.bz2";
if ($line =~ /\.([a-z0-9]+)(?:\.(bz2|gz|7z|zip))?$/i) {
p($1, $2 || 'none');
p([$1, $2 || 'none']);
}
On first invocation of p(), and after executing of regexp match, values in @_ will become undefs. On the second invocation, everything is OK (values passed as array ref are not affected).
This was tested with Perl versions 5.8.8 (CentOS 5.6) and 5.12.3 (Fedora 14).
The question is - how this could happen, that regexp match destroys content of @_, which was built using $1, $2 etc (other values, if you add them, are not affected)?
The perlsub man page says:
The array @_ is a local array, but its elements are aliases for the actual scalar parameters.
So when you pass $1
to a subroutine, inside that subroutine $_[0]
is an alias for $1
, not a copy of $1
. Therefore it gets modified by the regexp match in your p
.
In general, the start of every Perl subroutine should look something like this:
my @args = @_;
...or this:
my ($arg1, $arg2) = @_;
...or this:
my $arg = shift;
And any capturing regexp should be used like this:
my ($match1, $match2) = $str =~ /my(funky)(regexp)/;
Without such disciplines, you are likely to be driven mad by subtle bugs.
As suggested, copying the args in every sub is a good idea (if only to document what they are by giving them a non-punctuation name).
However, it's also a good idea to never pass global variables; pass "$1", "$2"
, not $1, $2
. (This applies to things like $DBI::errstr
too.)
I am not quite sure why this happens but I would say you should use something like my $arg1 = shift; my $arg2 = shift; and use $arg1 and $arg2 in your sub.
Using the perl debugger you will see that @_ looks different in the 2 sub calls:
1st call: Before match:
x @_
0 'tsv'
1 'bz2'
After match:
x @_
0 undef
1 undef
I think this was overwritten by the match.
2nd call: Before match:
x @_
0 ARRAY(0xc2b6e0)
0 'tsv'
1 'bz2'
After match:
x @_
0 ARRAY(0xc2b6e0)
0 'tsv'
1 'bz2'
So maybe this wasn't overwritten because of the different structure(?).
Hope this helps a little.
精彩评论