开发者

Python method or class to compare two video files?

开发者 https://www.devze.com 2023-02-08 00:33 出处:网络
I\'m trying to write a program to compare files and show the duplicates in python. Anyone know 开发者_如何转开发any good functions or methods related to this? I am sorta lost...If you\'re just looking

I'm trying to write a program to compare files and show the duplicates in python. Anyone know 开发者_如何转开发any good functions or methods related to this? I am sorta lost...


If you're just looking for exact duplicates, do an MD5 hash on both and see if they match:

import hashlib

file1 = open('file1.avi', 'r').read()
file2 = open('file2.avi', 'r').read()

if hashlib.sha512(file1).hexdigest() == hashlib.sha512(file2).hexdigest():
  print 'They are the same'
else:
  print 'They are different'

If not, I'd try OpenCV's Python Bindings and check if they match up frame by frame.


I would use os.walk to go through the file tree.

For each file, I would store the absolutepath+filename, indexed by file size and signature (first 16 bytes? Hash of first 512 bytes? Hash on full file?).

When finished, you end up with a dict of file sizes; for each size, a dict of file signatures; for each signature, a list of all files sharing that signature. If your file signature is not based on the full file, or has significant chance of collisions, you can then do a more in-depth comparison of just those colliding files.


I would first start out comparing filenames and filesizes. If you find a match, you could then loop through the bytes of the file to compare them, although this is probably pretty intensive.

I do not know of a library that can do this in python.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号