Why would I return a hash or a hash reference in Perl?_问答_开发者

What is the most effective way of accomplishing the below? (I know they accomplish the same thing, but how would most people do this between the three, and why?)

File a.pl

my %hash = build_hash();
# Do stuff with hash using $hash{$key}
sub build_hash
{
    # Build some hash
    my %hash = ();
    my @k = qw(hi bi no th xc ul 8e r);
    for ( @k )
    {
        $hash{$k} = 1;
    }

    # Does this return a copy of the hash??
    return %hash;
}

File b.pl

my $hashref = build_hash();
# Do stuff with hash using $hashref->{$key}
sub build_hash
{
    # Build some hash
   开发者_如何学编程 my %hash = ();
    my @k = qw(hi bi no th xc ul 8e r);
    for ( @k )
    {
        $hash{$k} = 1;
    }

    # Just return a reference (smaller than making a copy?)
    return \%hash;
}

File c.pl

my %hash = %{build_hash()};
# Do stuff with hash using $hash{$key}
# It is better, because now we don't have to dereference our hashref each time using ->?

sub build_hash
{
    # Build some hash
    my %hash = ();
    my @k = qw(hi bi no th xc ul 8e r);
    for ( @k )
    {
        $hash{$k} = 1;
    }

    return \%hash;
}

I prefer returning a hash ref for two reasons. One, it uses a bit less memory since there's no copy. Two, it lets you do this if you just need one piece of the hash.

my $value = build_hash()->{$key};

Learn to love hash references, you're going to be seeing them a lot once you start using objects.

Why not return both? Context is a very powerful feature in Perl to allow your functions to "do what you mean". Often the decision of which is a better return value depends on how the calling code plans to use the value, which is exactly why Perl has the builtin wantarray.

sub build_hash {
    my %hash;
    @hash{@keys} = (1) x @keys;
    wantarray ? %hash : \%hash
}

my %hash = build_hash;  # list context, a list of (key => value) pairs
my $href = build_hash;  # scalar context, a hash reference

I would return the reference to save the processing time of flattening the hash into a list of scalars, building the new hash and (possibly) garbage collecting the local hash in the subroutine.

What you're looking for is a hash slice:

# assigns the value 1 to every element of the hash

my %hash;                                   # declare an empty hash
my @list = qw(hi bi no th xc ul 8e r);      # declare the keys as a list
@hash{@list} =                              # for every key listed in @list,
                (1) x @list;                # ...assign to it the corresponding value in this list
                                            # which is (1, 1, 1, 1, 1...)  (@list in scalar context
                                            #   gives the number of elements in the list)

The x operator is described at perldoc perlop.

See perldoc perldsc and perldoc perlreftut for tutorials on data structures and references (both must-reads for beginners and experts alike). Hash slices themselves are mentioned in perldoc perldata.

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

Return values from functions are always lists (where returning a scalar is essentially a list of one element). Hashes are lists in Perl: You can assign one to the other interchangably (assuming the list has an even number of elements and there are no key collisions which would result in some values being lost during the conversion):

use strict; use warnings;
use Data::Dumper;

function foo
{
    return qw(key1 value1 key2 value2);
}

my @list = foo();
my %hash = foo();

print Dumper(\@list);
print Dumper(\%hash);

gives:

$VAR1 = [
          'key1',
          'value1',
          'key2',
          'value2'
        ];

$VAR1 = {
          'key2' => 'value2',
          'key1' => 'value1'
        };

PS. I highly recommend writing up small sample programs like the one above to play around with data structures and to see what happens. You can learn a lot by experimenting!

Regarding returning a hash from a function, normally you should return the hash itself, not a reference. You could use a reference if the hash is huge and memory or time is a concern, but that shouldn't be your first worry -- getting the code working is.

I'm going to have to disagree with Ether here. There was a time when I took that position, but quickly found myself descending into a hell of having to remember which subs returned hashes and which returned hashrefs, which was a rather serious impediment to just getting the code working. It's important to standardize on either always returning a hash/array or always returning a hashref/arrayref unless you want to be constantly tripping over yourself.

As for which to standardize on, I see several advantages to going with references:

When you return a hash or array, what you're actually returning is a list containing a flattened copy of the original hash/array. Just like passing in hash/array parameters to a sub, this has the disadvantage that you can only send one list at a time. Granted, you don't often need to return multiple lists of values, but it does happen, so why choose to standardize on doing things in a way which precludes it?
The (usually negligible) performance/memory benefits of returning a single scalar rather than a potentially much larger chunk of data.
It maintains consistency with OO code, which frequently passes objects (i.e., blessed references) back and forth.
If, for whatever reason, it's important that you have a fresh copy of the hash/array rather than a reference to the original, the calling code can easily make one, as the OP demonstrated in c.pl. If you return a copy of the hash, though, there's no way for the caller to turn that into a reference to the original. (In cases where this is advantageous, the function can make a copy and return a reference to the copy, thus protecting the original while also avoiding the "this returns hashes, that returns hashrefs" hell I mentioned earlier.)
As Schwern mentioned, it's real nice to be able to do my $foo = $obj->some_data->{key}.

The only advantage I can see to always returning hashes/arrays is that it is easier for those who don't understand references or aren't comfortable working with them. Given that comfort with references takes a matter of weeks or months to develop, followed by years or decades of working with them fluently, I don't consider this a meaningful benefit.

a.pl and c.pl require a copy of the hash to be taken (and the hash internal to the function is marked as free memory). b.pl, on the other hand, builds the hash just once and requires little extra memory to return a reference upon which you can operate. Thus b.pl is more likely to be the most efficient form of the three, both in space and time.

I'm going to go against the grain and what everyone else is saying, and say that I prefer to have my data returned as a hash (well, as an even-sized list which is likely to be interpreted as a hash). I work in an environment where we tend to do things like the following code snippet, and it's much easier to combine and sort and slice and dice when you don't have to dereference every other line. (It's also nice to know that someone can't damage your hashref because you passed the entire thing by value. edit: unless you've got references to other objects/hashes/arrays in the hash-values, then you're in trouble anyway).

my %filtered_config_slice = 
   hashgrep { $a !~ /^apparent_/ && defined $b } (
   map { $_->build_config_slice(%some_params, some_other => 'param') } 
   ($self->partial_config_strategies, $other_config_strategy)
);

This approximates something that my code might do: building a configuration for an object based on various configuration strategy objects (some of which the object knows about inherently, plus some extra guy) and then filters out some of them as irrelevant.

(Yes, we have nice tools like hashgrep and hashmap and lkeys that do useful things to hashes. $a and $b get set to the key and the value of each item in the list, respectively). (Yes, we have people who can program at this level. Hiring is obnoxious, but we have a quality product.)

If you don't intend to do anything like this resembling functional programming, or if you need more performance (have you profiled?) then sure, use hashrefs.

Take a care: a.pl returns a list with an even number of elements, not a hash. When you then assign such a list to a hash variable, the hash will be built with the elements at the even indices as keys and the elements at the odd indices as values. [EDIT: That was how I always saw the matter, but sub { ... %hash } actually behaves a bit differently than sub { ... @list }. ]

For the same reason, building a hash, the way you describe, is as simple as:

my %hash = map { $_ => 1 } qw(hi bi no th xc ul 8e r);

My personal rule of thumb is to avoid references unless I really need them (e.g. nested structures, or when you really need to pass around a reference to the same thing).

EDIT: (I can't click the "add comment" link anymore?! Using mousekeys here...) I thought about it a little and I think passing around hash refs is probably better after all, due to the way we use a hash. The paragraph above still holds for array refs though.

Thanks for your comments Schwern and Ether.