开发者

How can I match files by file size and rename accordingly?

开发者 https://www.devze.com 2023-03-28 07:31 出处:网络
I have two directories of images with mismatching names, but mostly matching images. Dir 1Size| Dir 2Size

I have two directories of images with mismatching names, but mostly matching images.

Dir 1       Size   | Dir 2                  Size
---------------------------------------------------
img1.jpg    508960 | a_image_name.jpg       1038644
img2.jpg    811430 | another_image_name.jpg 396240
...         ...    | ...                    ...
img1000.jpg 602583 | image开发者_StackOverflow_name.jpg         811430
...         ...    | 
img2000.jpg 396240 | 

The first directory has more images, but is misnamed. The second directory has the correct names, but not corresponding in order to the first directory.

I'd like to rename files in Dir 1 by comparing file size (or some other way) to Dir 2. In the above example img2.jpg would be renamed to image_name.jpg because both have the same file size.

Can you point me in the right direction?

Preferably by way of app (Mac), shell, or php.


Maybe it would be wiser to use hashes of the files instead of using the filesize?

In short: using glob(), get a list of files in dir1, iterate, create md5-hash (md5() + file_get_contents()), store in an array, using the hash as key and the filename as value. Do the same for dir2.

iterate array1, if an entry with the same hash exists in array2 rename file

Code will be something like this: (untested, unoptimized)

$dir1 = array();
$dir2 = array();

// get hashes for dir1
foreach( glob( '/path/to/dir1/*.jpg' ) as $file ) {
 $hash = md5( file_get_contents( $file ) );
 $dir1[ $hash ] = $file;
}

// repeat for dir2 ...

foreach( $dir1 as $hash => $file1 ) {
 if( array_key_exists( $hash, $dir2 ) ) {
  rename( $file1, $dir2[ $hash ] );
 }
}


Here is my solution, which rename files in dir1 based on file size.

Contents of dir1:

-rw-r--r--  1 haiv  staff   10 Aug 16 13:18 file1.txt
-rw-r--r--  1 haiv  staff   20 Aug 16 13:18 file2.txt
-rw-r--r--  1 haiv  staff   30 Aug 16 13:18 file3.txt
-rw-r--r--  1 haiv  staff  205 Aug 16 13:18 file4.txt

(Note the fifth column stores the file sizes.) And the contents of dir2:

-rw-r--r--  1 haiv  staff   30 Aug 16 13:18 doc.txt
-rw-r--r--  1 haiv  staff  205 Aug 16 13:18 dopey.txt
-rw-r--r--  1 haiv  staff   20 Aug 16 13:18 grumpy.txt
-rw-r--r--  1 haiv  staff   10 Aug 16 13:18 happy.txt

Create a file call ~/rename.awk (yes, from the home directory, to avoid polluting either dir1 or dir2):

/^total/ {next} # Skip the first line (which contains the total, of ls -l)

{
    if (name[$5] == "") {
        name[$5] = $NF
        print "# File of size", $5, "should be named", $NF
    } else {
        printf "mv '%s' '%s'\n", $NF, name[$5]
    }
}

Now, cd into dir1 (if you want to rename files in dir1), and issue the following command:

$ awk -f ~/rename.awk <(ls -l ../dir2) <(ls -l)

Output:

# File of size 30 should be named doc.txt
# File of size 205 should be named dopey.txt
# File of size 20 should be named grumpy.txt
# File of size 10 should be named happy.txt
mv 'file1.txt' 'happy.txt'
mv 'file2.txt' 'grumpy.txt'
mv 'file3.txt' 'doc.txt'
mv 'file4.txt' 'dopey.txt'

Once you are happy with the result, pipe the above command to sh to execute the changes:

$ awk -f ~/rename.awk <(ls -l ../dir2) <(ls -l) | sh

Notes:

  1. No safeguard against files with the same size. For that, the MD5 solution which wonk0 offered works better.
  2. Please examine the output before you commit. Changes are permanent.
0

精彩评论

暂无评论...
验证码 换一张
取 消