I am trying to figure out a way to organize files in a folder with perl. The basis of my script is sort of simple. The script reads the subfolders in the root folder, goes through each folder, and uses a simple comparison of the first 3 digits in the file name to the 3 digit folder name. If they do not match, it puts it into the right place.
My problem is that while looking through the files I sometimes run
into a duplicate order number. Because the order already exists I
cannot overwrite the original, but the two files cannot exist in the
same place for obvious reasons. So I had come up with the idea to
append the word _TEMP
to the end of the file name in order to move
them and they can be renamed later. The problem I am running into now
is when I already have two duplicates. I am looking for a way to
allow the TEMP tag to increment by 1 everytime it is used and then
reset to zero each time the loop starts again. I am just not really
sure where I should implement this idea.
Here is the main routine for the script:
for开发者_如何学Ceach my $office (keys %office_names) {
make_junk_folder($office);
# The matches and unmatches come back as array references!
my ($returned_matches, $returned_unmatches) = read_root_folder($office);
foreach my $folder (@$returned_matches) {
my $returned_files = read_subfolder($office, $folder);
foreach my $file (@$returned_files) {
analyze_file($office,$folder,$file);
}
}
foreach my $folder (@$returned_unmatches) {
print "$folder\n";
remove_root_junk($office,$folder);
}
}
Here is the subroutine that handles the file moving and renaming:
sub analyze_file {
my $office = shift;
my $folder = shift;
my $file = shift;
my $order_docs_path = $office_names{$office};
if ($file =~ /^(?<office> (C[AFL]|ME)) (?<folder_num> \d{3})
(?<file_num> \d{3}) ([_|\-] \d+)? \. (?<file_ext> pdf)
$/xmi) {
my $file_office = uc($+{office});
my $folder_num = $+{folder_num};
my $file_num = $+{file_num};
my $file_ext = lc($+{file_ext});
# Change hyphens to a underscore
$file_num =~ s/\-/_/;
my $file_name = "$file_office" . "$folder_num" . "$file_num" .
"\." . "$file_ext";
my $temp_name = "$file_office" . "$folder_num" . "$file_num" .
"_TEMP" . "\." . "$file_ext";
if ($folder != $folder_num) {
# If the folder does not exist create the folder
if (! -e "$order_docs_path\\$folder_num") {
system "mkdir $order_docs_path\\$folder_num";
}
# Check to see if the file already exists
if ( -e "$order_docs_path\\$folder_num\\$file_name") {
# Append the file with a "_TEMP". These files are
# missorted pages belonging to a larger document
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$temp_name");
} else {
# Moves the file to correct place, these are mismatched files
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$file_name");
}
} else {
# Files are in the correct place, the file name will be
# corrected only
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$file_name");
}
}
}
Some example filenames look like this:
CF100145.pdf
CA310244.pdf
CL211745.pdf
CL211745_1.pdf (This denotes a second page, from our document scanner)
ME102103.pdf
Where problems occur is when the same job is changed and rather than putting the file where it should go the person places the updated job information into the current job folder, the directory does not match the first 3 numbers in the file. So later they have to be sorted in order to solve errors, the problem is there are over 500,000 documents in just one office and we got 4 offices.
I would change this part:
if ( -e "$order_docs_path\\$folder_num\\$file_name") {
# Append the file with a "_TEMPn". These files are
# missorted pages belonging to a larger document
my $n = 1;
while (-e "$order_docs_path\\$folder_num\\$temp_name") {
$temp_name =~ s/TEMP\d*/TEMP$n/;
$n++;
}
rename ("$order_docs_path\\$folder\\$file",
"$order_docs_path\\$folder_num\\$temp_name");
} # ...
You can add _TEMP_OLD_FOLDER_NAME to it.
In this folder was only one file, so no files will be with same new name
精彩评论