I need to copy files from a set of CDs that have a lot of duplicate content, with each other, and with what's already on my hard disk. The file names of identical files are not the same, and are in sub-directories of different names. I want to copy non-duplicate files from the CD into a new directory on the hard disk. I don't care about the sub-directories - I 开发者_如何学Cwill sort it out later - I just want the unique files.
I can't find software to do that - see my post at SuperUser https://superuser.com/questions/129944/software-to-copy-non-duplicate-files-from-cd-dvd
Someone at SuperUser suggested I write a script using GNU's "find" and the Win32 version of some checksum tools. I glanced at that, and have not done anything like that before. I'm hoping something exists that I can modify.
I found a good program to delete duplicates, Duplicate Cleaner (it compares checksums), but it won't help me here, as I'd have to copy all the CDs to disk, and each is probably about 80% duplicates, and I don't have room to do that - I'd have to cycle through a few at a time copying everything, then turning around and deleting 80% of it, working the hard drive a lot.
Thanks for any help.
I don't use Windows, but I'll give a suggestion: a combination of GNU find
and a Lua script. For find
you can try
find / -exec md5sum '{}' ';'
If your GNU software includes xargs
the following will be equivalent but may be significantly faster:
find / -print0 | xargs -0 md5sum
This will give you a list of checksums and corresponding filenames. We'll throw away the filenames and keep the checksums:
#!/usr/bin/env lua
local checksums = {}
for l in io.lines() do
local checksum, pathname = l:match('^(%S+)%s+(.*)$')
checksums[checksum] = true
end
local cdfiles = assert(io.popen('find e:/ -print0 | xargs -0 md5sum'))
for l in cdfiles:lines() do
local checksum, pathname = l:match('^(%S+)%s+(.*)$')
if not checksums[checksum] then
io.stderr:write('copying file ', pathname, '\n')
os.execute('cp ' .. pathname .. ' c:/files/from/cd')
checksums[checksum] = true
end
end
You can then pipe the output from
find / -print0 | xargs -0 md5um
into this script.
There are a few problems:
If the filename has special characters, it will need to be quoted. I don't know the quoting conventions on Windows.
It would more efficient to write the checksums to disk rather than to run find all the time. You could try
local csums = assert(io.open('/tmp/checksums', 'w')) for cs in pairs(checksums) do csums:write(cs, '\n') end csums:close()
And then read checksums back in from the file using
io.lines
again.
I hope this is enough to get you started. You can download Lua from http://lua.org, and I recommend the superb book Programming in Lua (check out the previous edition free online).
精彩评论