开发者

Is perl's each function worth using?

开发者 https://www.devze.com 2022-12-22 08:18 出处:网络
From perldoc -f each we read: There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash,

From perldoc -f each we read:

There is a single iterator for each hash, shared by all each, keys, and values function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating keys HASH or values HASH.

The iterator is not reset when you leave the scope containing the each(), and this can lead to bugs:

my %h = map { $_, 1 } qw(1 2 3);
while (my $k = each %h) { print "1: $k\n"; last }
while (my $k = each %h) { print "2: $k\n"       }

Output:

1: 1
2: 3
2: 2

What are the common worka开发者_如何学Crounds for this behavior? And is it worth using each in general?


I think it is worth using as long as you are aware of this. It's ideal when you need both key and value in iteration:

while (my ($k,$v) = each %h) {
    say "$k = $v";
}

In your example you can reset the iterator by adding keys %h; like so:

my %h = map { $_ => 1 } qw/1 2 3/;
while (my $k = each %h) { print "1: $k\n"; last }
keys %h;  # reset %h
while (my $k = each %h) { print "2: $k\n" }

From Perl 5.12 each will also allow iteration on an array.


I find each to be very handy for idioms like this:

my $hashref = some_really_complicated_method_that_builds_a_large_and_deep_structure();
while (my ($key, $value) = each %$hashref)
{
    # code that does stuff with both $key and $value
}

Contrast that code to this:

my $hashref = ...same call as above
foreach my $key (keys %$hashref)
{
    my $value = $hashref->{$key};
    # more code here...
}

In the first case, both $key and $value are immediately available to the body of the loop. In the second case, $value must be fetched first. Additionally, the list of keys of $hashref may be really huge, which takes up memory. This is occasionally an issue. each does not incur such overhead.

However, the drawbacks of each are not instantly apparent: if aborting from the loop early, the hash's iterator is not reset. Additionally (and I find this one more serious and even less visible): you cannot call keys(), values() or another each() from within this loop. To do so would reset the iterator, and you would lose your place in the while loop. The while loop would continue forever, which is definitely a serious bug.


each is too dangerous to ever use, and many style guides prohibit its use completely. The danger is that if a cycle of each is aborted before the end of the hash, the next cycle will start there. This can cause very hard-to-reproduce bugs; the behavior of one part of the program will depend on a completely unrelated other part of the program. You might use each right, but what about every module ever written that might use your hash (or hashref; it's the same)?

keys and values are always safe, so just use those. keys makes it easier to traverse the hash in deterministic order, anyway, which is almost always more useful. (for my $key (sort keys %hash) { ... })


each is not only worth using, it's pretty much mandatory if you want to loop over all of a tied hash too big for memory.

A void-context keys() (or values, but consistency is nice) before beginning the loop is the only "workaround" necessary; is there some reason you are looking for some other workaround?


use the keys() function to reset the iterator. See the faq for more info


each has a buit-in, hidden global variable that can hurt you. Unless you need this behavior, it's safer to just use keys.

Consider this example where we want to group our k/v pairs (yes, I know printf would do this better):

#!perl

use strict;
use warnings;

use Test::More 'no_plan';

{   my %foo = map { ($_) x 2 } (1..15);

    is( one( \%foo ), one( \%foo ), 'Calling one twice works with 15 keys' );
    is( two( \%foo ), two( \%foo ), 'Calling two twice works with 15 keys' );
}

{   my %foo = map { ($_) x 2 } (1..105);

    is( one( \%foo ), one( \%foo ), 'Calling one twice works with 105 keys' );
    is( two( \%foo ), two( \%foo ), 'Calling two twice works with 105 keys' );
}


sub one {
    my $foo = shift;

    my $r = '';

    for( 1..9 ) {
        last unless my ($k, $v) = each %$foo;

        $r .= "  $_: $k -> $v\n";
    }
    for( 10..99 ) {
        last unless my ($k, $v) = each %$foo;

        $r .= " $_: $k -> $v\n";
    }

    return $r;
}

sub two {
    my $foo = shift;

    my $r = '';

    my @k = keys %$foo;

    for( 1..9 ) {
        last unless @k;
        my $k = shift @k;

        $r .= "  $_: $k -> $foo->{$k}\n";
    }
    for( 10..99 ) {
        last unless @k;
        my $k = shift @k;

        $r .= "  $_: $k -> $foo->{$k}\n";
    }

    return $r;
}

Debugging the error shown in the tests above in a real application would be horribly painful. (For better output use Test::Differences eq_or_diff instead of is.)

Of course one() can be fixed by using keys to clear the iterator at the start and end of the subroutine. If you remember. If all your coworkers remember. It's perfectly safe as long as no one forgets.

I don't know about you, but I'll just stick with using keys and values.


It's best if used as it's name: each. It's probably the wrong thing to use if you mean "give me the first key-value pair," or "give me the first two pairs" or whatever. Just keep in mind that the idea is flexible enough that each time you call it, you get the next pair (or key in a scalar context).


each() can be more efficient if you are iterating through a tied hash, for example a database that contains millions of keys; that way you don't have to load all the keys in memory.

0

精彩评论

暂无评论...
验证码 换一张
取 消