开发者

Python library to detect if a file has changed between different runs?

开发者 https://www.devze.com 2022-12-13 23:25 出处:网络
Suppose I have a program A. I run it, and performs some operation starting from a file foo.txt. Now A terminates.

Suppose I have a program A. I run it, and performs some operation starting from a file foo.txt. Now A terminates.

New run of A. It checks if the file foo.txt has changed. If the file has changed, A runs its operation again, otherwise, it quits.

Does a libra开发者_高级运维ry function/external library for this exists ?

Of course it can be implemented with an md5 + a file/db containing the md5. I want to prevent reinventing the wheel.


It's unlikely that someone made a library for something so simple. Solution in 13 lines:

import pickle
import md5
try:
    l = pickle.load(open("db"))
except IOError:
    l = []
db = dict(l)
path = "/etc/hosts"
checksum = md5.md5(open(path).read())
if db.get(path, None) != checksum:
    print "file changed"
    db[path] = checksum
pickle.dump(db.items(), open("db", "w")


FYI - for those using this example who got this error: "TypeError: can't pickle HASH objects" Simply modify the following (optionally update md5 to hashlib, md5 is deprecated):

    import pickle
    import hashlib #instead of md5
    try:
        l = pickle.load(open("db"))
    except IOError:
        l = []
    db = dict(l)
    path = "/etc/hosts"
    #this converts the hash to text
    checksum = hashlib.md5(open(path).read()).hexdigest() 
    if db.get(path, None) != checksum:
        print "file changed"
        db[path] = checksum
    pickle.dump(db.items(), open("db", "w"))

so just change:

    checksum = hashlib.md5(open(path).read())

to

    checksum = hashlib.md5(open(path).read()).hexdigest()


This is one of those things that is both so trivial to implement and so app-specific that there really wouldn't be any point in a library, and any library intended for this purpose would grow so unwieldy trying to adapt to the many variations required, learning and using the library would take as much time as implementing it yourself.


Cant we just check the last modified date . i.e after the first operation we store the last modified date in the db , and then before running again we compare the last modified date of the file foo.txt with the value stored in our db .. if they differ ,we perform the operation again ?

0

精彩评论

暂无评论...
验证码 换一张
取 消