开发者

Comparing 2700+ filenames against 29000+ rows in a db table

开发者 https://www.devze.com 2022-12-19 08:58 出处:网络
I have a ftp repository that is currently at 2761 files (PDF files). I have a MySQL table (a list of those files) that\'s actually at 29k+ files (it hasn\'t been parsed since a recent upgrade).

I have a ftp repository that is currently at 2761 files (PDF files). I have a MySQL table (a list of those files) that's actually at 29k+ files (it hasn't been parsed since a recent upgrade). I'd like to provide admins with a one-click script that will do the following:

  • 1) Compare the "existing" filenames with the rows in the database table
  • 2) Delete any rows that are not in the existing filesystem
  • 3) Add a new row for a file that doesn't appear in the database table

This usually is handled by an AppleScript/FolderAction/Perl script method, but it's not perfect (it chokes sometimes when large numbers of files are added at a time - like on heavy news nights).

It takes about 10-20 seconds to build the file list from 开发者_如何学Gothe FTP repository (using $file_list = ftp_nlist($conn_id,$target_dir) ), and I'm not sure how to best go about comparing with the DB table (I'm positive that an WHERE NOT IN (big_fat_list) would be a nightmare query to run).

Any suggestions?


Load the list of filenames into another table, then perform a couple of queries that fulfill your requirements.


Yup that is the solution. I propose you to use pdo prepared insert statement to reduce time. or do what mysqldump does, generate insert into table(column1,column2, ... ) values(), (), (), ... ; insert into ...

you would have to check the maximun values list in mysql site.


I usually dump the recursive directory list with dates and file sizes to a temporary table. Then I remove items not found:

delete
from A
where not exists (
    select null as nothing
    from   temp b
    where  a.key = b.key )

Then I update items already there (for file sizes, CRCs):

update a set nonkeyfield1 = b.nonkeyfield1, nonkeyfield2 = b.nonkeyfield2
from   a join temp b on a.key = b.key

Then I add items found:

insert into A ( field, list)
select field, list
from   temp b
where  not exists (
    select null as nothing
    from   A
    where  b.key = a.key )

This is from memory, so test it first before you fly. The select null as nothing keeps you from wasting RAM while you check things.

0

精彩评论

暂无评论...
验证码 换一张
取 消