开发者

Can Perl's "exists" modify data structure values?

开发者 https://www.devze.com 2023-03-24 16:28 出处:网络
I have a nested hash table that looks like this: my %myhash = ( \"val1\" => { \"A/B.c\" => { 开发者_开发问答\"funct1\" => 1

I have a nested hash table that looks like this:

my %myhash = (
    "val1" => {
        "A/B.c" => {
            开发者_开发问答"funct1" => 1
        }
    },
    "val2" => {
        "C/D.c" => {
            "funct2" => 1
        }
    }
)

My objective with this data structure is to produce different values based on whether certain hash tables exist. For example,

sub mysub
{
    my $val = shift;
    my $file = shift;
    my $funct = shift;

    if (exists $myhash{$val}{$file}{$funct}) {
        return "return1";
    }
    if (exists $myhash{$val}{$file}) {
        return "return2";
    }
    return "return3";
}

The behavior I'm encountering is as follows. I have an instance in time when my $val = "val1"; my $file = "C/D.c"; my $funct = "funct3";

At this point in time, the return value I get "return2". These are my observations with the Perl debugger:

  1. Break at first "if" in mysub
  2. Print p $proxToBugs{"val1"}{"C/D.c"} ==> Returns blank line. Okay. Continue and this "if" is skipped.
  3. Continue and break at the second "if" in mysub
  4. Print p $proxToBugs{"val1"}{"C/D.c"} ==> Returns "HASH(0x...)". WTF moment. Function returns "return2".

This tells me that running the first if modified the data structure, which allows the second if to pass when in fact it shouldn't. The function I'm running is identical to the function shown above; this one is just sanitized. Anyone has an explanation for me? :)


Yes. This is because of autovivification. See the bottom of the exists documentation:

Although the mostly deeply nested array or hash will not spring into existence just because its existence was tested, any intervening ones [autovivified arrays or hashes] will [spring into existance]. Thus $ref->{"A"} and $ref->{"A"}->{"B"} will spring into existence due to the existence test for the $key element above. This happens anywhere the arrow operator is used...

Where "...test for the $key element above..." refers to:

if (exists $ref->{A}->{B}->{$key})  { }
if (exists $hash{A}{B}{$key})       { } # same idea, implicit arrow

Happy coding.


As pst rightly points out, this is autovivification. There are at least two ways to avoid it. The first (and most common in my experience) is to test at each level:

if (
    exists $h{a}       and
    exists $h{a}{b}    and
    exists $h{a}{b}{c}
) {
    ...
}

The short-circuit nature of and causes the second and third calls to exists to not be executed if the earlier levels don't exist.

A more recent solution is the autovivification pragma (available from CPAN):

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

$Data::Dumper::Useqq = 1;

{
    my %h;

    if (exists $h{a}{b}{c}) {
        print "impossible, it is empty\n";
    }

    print Dumper \%h;
}

{
    no autovivification;

    my %h;

    if (exists $h{a}{b}{c}) {
        print "impossible, it is empty\n";
    }

    print Dumper \%h;
}

A third method that ysth mentions in the comments has the benefits of being in core (like the first example) and of not repeating the exists function call; however, I believe it does so at the expense of readability:

if (exists ${ ${ $h{a} || {} }{b} || {} }{c}) {
    ...
}

It works by replacing any level that doesn't exist with a hashref to take the autovivification. These hashrefs will be discarded after the if statement is done executing. Again we see the value of short-circuiting logic.

Of course, all three of these methods makes an assumption about the data the hash is expected to hold, a more robust method includes calls to ref or reftype depending on how you want to treat objects (there is a third option that takes into account classes that overload the hash indexing operator, but I can't remember its name):

if (
    exists $h{a}           and
    ref $h{a} eq ref {}    and
    exists $h{a}           and
    ref $h{a}{b} eq ref {} and
    exists $h{a}{b}{c}
) {
    ...
}

In the comments, pst asked if something like myExists($ref,"a","b","c") exists. I am certain there is a module in CPAN that does something like that, but I am not aware of it. There are too many edge cases for me to find that useful, but a simple implementation would be:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

sub safe_exists {
    my ($ref, @keys) = @_;

    for my $k (@keys) {
        return 0 unless ref $ref eq ref {} and exists $ref->{$k};
        $ref = $ref->{$k};
    }
    return 1;
}

my %h = (
    a => {
        b => {
            c => 5,
        },
    },
);

unless (safe_exists \%h, qw/x y z/) {
    print "x/y/z doesn't exist\n";
}

unless (safe_exists \%h, qw/a b c d/) {
    print "a/b/c/d doesn't exist\n";
}

if (safe_exists \%h, qw/a b c/) {
    print "a/b/c does exist\n";
}

print Dumper \%h;


If you want to turn off autovivification, you can do that lexically with the autovivification pragma:

 {
 no autovivification;

 if( exists $hash{A}{B}{$key} ) { ... }
 }

I wrote more about this at The Effective Perler as Turn off autovivification when you don’t want it.

0

精彩评论

暂无评论...
验证码 换一张
取 消