Result: Many lines of HASH(0x1948958) ARRAY(0x1978250) ./directory/filename
Desired result: [Key of first hash] [Key of second hash] ./directory/filename #(elements of array, currently working)
Catch: Should carry across to N level structures, hence my attempt at using Data::Walk.
What I really want to do as I walk the structure is to reference the key that is being used. Kind of like Data::Dumper but tab-separated instead of in code format. I think the likely solutions (in order of preference) are:
- Some call to Data::Walk that I've overlooked.
- A better module for this task that I don't know about.
- A quick code snippet that I can inline
- My own module / fork of Data::Walk / Data::Dumper (big frown) that will add this functionality.
use strict;
use File::Basename;
use Data::Walk;
my $files;
while (<>) {
chomp;
#ls -l output in a file; referencing filename from it (8th column)
my @line = split(/ /, $_, 8);
#fileparse exported by File::Basename
my ($name,$path) = fileparse($line[7]);
open (my $fh, '<', $path . $name);
my $sha = Digest::SHA->new('sha1');
$sha->addfile($fh);
#finding files by basename, then unique hash, then however many places it is stored.
#question not why I don't use the hash as the first field.
#basename digest path
push(@{$files->{$name}->{$sha->hexdigest}}开发者_如何学Python, $path . $name);
}
my @val;
sub walkit {
$val[$Data::Walk::depth - 1] = $_;
if ($Data::Walk::depth == 3) {
print join("\t", @val), "\n";
}
}
&walk (\&walkit, %$files);
Gurus?
Edit: against my better judgement, I'll try to answer this question again.
Here's a simple approach to print what you want. Using Data::Walk is not feasible because you don't have key context when you are inside a hash (you just get a pointer to the container.)
This function works for somewhat complicated structures. Of course it will not give proper output if you put a function reference or something wonky in there.
use strict;
use warnings;
my $res;
sub walk {
my ($item, $path) = @_;
if (ref $item eq 'ARRAY') {
foreach (@$item) {
walk($_, $path);
}
} elsif (ref $item eq 'HASH') {
foreach (keys %$item) {
push @$path, $_;
walk($item->{$_}, $path);
pop @$path;
}
} else {
print join('-', @$path, $item), "\n";
}
}
my $struct = {
a => {
a1 => { a11 => [ 1, 2, 3 ] },
a2 => { a22 => [5, 6, 7] }
},
b => { b1 => [ 99 ], },
c => [ 100, 101, ],
d => [ 101, { d2 => { d3 => [200, 210] }, }, ],
};
walk $struct;
for my $name (keys(%$files)) {
for my $digest (keys(%{$files->{$name}})) {
my @qfns = @{ $files->{$name}{$digest} };
if (@qfns > 1) {
say "For $name and $digest,";
say " $_" for @qfns;
}
}
}
(I'm assuming you're looking for duplicates, so I print nothing when there's only one path associated with a name-digest combo. You can remove the if
if you want to print everything.)
Some other cleanup:
use strict;
use warnings;
use 5.010;
use Digest::SHA qw( );
use File::Basename qw( basename );
sub calc_digest {
my ($qfn) = @_;
open(my $fh, '<', $qfn) or die $!;
my $sha = Digest::SHA->new('sha1');
$sha->addfile($fh);
return $sha->hexdigest();
}
my $files;
while (<>) {
my $qfn = (split)[7];
my $name = basename($path);
my $digest = calc_digest($qfn);
push @{ $files->{$name}{$digest} }, $qfn;
}
("qfn" stands for "qualified file name", which means the path to the file, which wasn't what $path
contained. You were repeatedly building the path even though $line[7]
contained it.)
精彩评论