I have a giant list of file paths that are simply too large for our SCM to process. I need to whittle them down based on the lowest common level folder. For example, given the following paths:
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
Based on that, I would like to arrive at this:
//folder1/folder2
//folder1/folder3
//folderx/foldery
The folder list will be read from a text file, and is开发者_开发百科 around 2M line long.
Any help would be greatly appreciated.
This looks to be a good use for split()
and hashes:
use strict;
use warnings;
my %seen;
foreach my $path ( @paths ) {
$path =~ s|^//||; # Strip off leading //
my @elems = split( '/', $path );
$seen{$elems[0]}{$elems[1]}++;
}
foreach my $rootpath ( sort keys %seen ) {
foreach my $secondpath ( sort keys %{$seen{$rootpath}} ) {
print "//" . $rootpath . "/" . $secondpath . "\n";
}
}
If you only want to print out paths that have been seen twice or more, insert a next if $seen{$rootpath}{$secondpath} > 1;
before the print()
.
I haven't tested this so there could be syntax errors, but the code gives the general gist.
How about:
#!/usr/local/bin/perl
use strict;
use warnings;
use 5.010;
my %out;
while(<DATA>) {
chomp;
m#^(//[^/]+/[^/]+)#;
$out{$1} = 1;
}
say for keys%out;
__DATA__
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10
output:
//folderx/foldery
//folder1/folder3
//folder1/folder2
精彩评论