How can I cleanly turn a nested Perl hash into a non-nested one?_问答_开发者

Assume a nested hash structure %old_hash ..

my %old_hash;
$old_hash{"foo"}{"bar"}{"zonk"} = "hello";

.. which we want to "flatten" (sorry if that's the wrong terminology!) to a non-nested hash using the sub 开发者_Python百科&flatten(...) so that ..

my %h = &flatten(\%old_hash);
die unless($h{"zonk"} eq "hello");

The following definition of &flatten(...) does the trick:

sub flatten {
  my $hashref = shift;
  my %hash;
  my %i = %{$hashref};
  foreach my $ii (keys(%i)) {
    my %j = %{$i{$ii}};
    foreach my $jj (keys(%j)) {
      my %k = %{$j{$jj}};
      foreach my $kk (keys(%k)) {
        my $value = $k{$kk};
        $hash{$kk} = $value;
      }
    }
  }
  return %hash;
}

While the code given works it is not very readable or clean.

My question is two-fold:

In what ways does the given code not correspond to modern Perl best practices? Be harsh! :-)
How would you clean it up?

Your method is not best practices because it doesn't scale. What if the nested hash is six, ten levels deep? The repetition should tell you that a recursive routine is probably what you need.

sub flatten {
    my ($in, $out) = @_;
    for my $key (keys %$in) {
        my $value = $in->{$key};
        if ( defined $value && ref $value eq 'HASH' ) {
            flatten($value, $out);
        }
        else {
            $out->{$key} = $value;
        }
    }
}

Alternatively, good modern Perl style is to use CPAN wherever possible. Data::Traverse would do what you need:

use Data::Traverse;
sub flatten {
    my %hash = @_;
    my %flattened;
    traverse { $flattened{$a} = $b } \%hash;
    return %flattened;
}

As a final note, it is usually more efficient to pass hashes by reference to avoid them being expanded out into lists and then turned into hashes again.

First, I would use perl -c to make sure it compiles cleanly, which it does not. So, I'd add a trailing } to make it compile.

Then, I'd run it through perltidy to improve the code layout (indentation, etc.).

Then, I'd run perlcritic (in "harsh" mode) to automatically tell me what it thinks are bad practices. It complains that:

Subroutine does not end with "return"

Update: the OP essentially changed every line of code after I posted my Answer above, but I believe it still applies. It's not easy shooting at a moving target :)

There are a few problems with your approach that you need to figure out. First off, what happens in the event that there are two leaf nodes with the same key? Does the second clobber the first, is the second ignored, should the output contain a list of them? Here is one approach. First we construct a flat list of key value pairs using a recursive function to deal with other hash depths:

my %data = (
    foo  => {bar  => {baz  => 'hello'}},
    fizz => {buzz => {bing => 'world'}},
    fad  => {bad  => {baz  => 'clobber'}},
);


sub flatten {
    my $hash = shift;
    map {
        my  $value = $$hash{$_};
        ref $value eq 'HASH' 
            ? flatten($value) 
            : ($_ =>  $value)
    } keys %$hash
}

print join( ", " => flatten \%data), "\n";
# baz, clobber, bing, world, baz, hello

my %flat = flatten \%data;

print join( ", " => %flat ), "\n";
# baz, hello, bing, world          # lost (baz => clobber)

A fix could be something like this, which will create a hash of array refs containing all the values:

sub merge {
    my %out;
    while (@_) {
        my ($key, $value) = splice @_, 0, 2;
        push @{ $out{$key} }, $value
    }
    %out
}

my %better_flat = merge flatten \%data;

In production code, it would be faster to pass references between the functions, but I have omitted that here for clarity.

Is it your intent to end up with a copy of the original hash or just a reordered result?

Your code starts with one hash (the original hash that is used by reference) and makes two copies %i and %hash.

The statement my %i=%{hashref} is not necessary. You are copying the entire hash to a new hash. In either case (whether you want a copy of not) you can use references to the original hash.

You are also losing data if your hash in the hash has the same value as the parent hash. Is this intended?