开发者

Reduce folder lists to lowest common folder

开发者 https://www.devze.com 2023-04-08 15:53 出处:网络
I have a giant list of file paths that are simply too large for our SCM to process. I need to whittle them down based on the lowest common level folder. For example, given the following paths:

I have a giant list of file paths that are simply too large for our SCM to process. I need to whittle them down based on the lowest common level folder. For example, given the following paths:

//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10

Based on that, I would like to arrive at this:

//folder1/folder2
//folder1/folder3
//folderx/foldery

The folder list will be read from a text file, and is开发者_开发百科 around 2M line long.

Any help would be greatly appreciated.


This looks to be a good use for split() and hashes:

use strict;
use warnings;

my %seen;
foreach my $path ( @paths ) {
  $path =~ s|^//||; # Strip off leading //
  my @elems = split( '/', $path );
  $seen{$elems[0]}{$elems[1]}++;
}

foreach my $rootpath ( sort keys %seen ) {
  foreach my $secondpath ( sort keys %{$seen{$rootpath}} ) {
    print "//" . $rootpath . "/" . $secondpath . "\n";
  }
}

If you only want to print out paths that have been seen twice or more, insert a next if $seen{$rootpath}{$secondpath} > 1; before the print().

I haven't tested this so there could be syntax errors, but the code gives the general gist.


How about:

#!/usr/local/bin/perl 
use strict;
use warnings;
use 5.010;

my %out;
while(<DATA>) {
    chomp;
    m#^(//[^/]+/[^/]+)#;
    $out{$1} = 1;
}
say for keys%out;

__DATA__
//folder1/folder2/folder2
//folder1/folder2/folder5
//folder1/folder3/folder6
//folderx/foldery/folder9
//folderx/foldery/folder10

output:

//folderx/foldery
//folder1/folder3
//folder1/folder2
0

精彩评论

暂无评论...
验证码 换一张
取 消