开发者

SQL: Remove rows whose associations are broken (orphaned data)

开发者 https://www.devze.com 2023-03-05 17:05 出处:网络
I have a table called \"downloads\" with two foreign key columns --\"user_id\" and \"item_id\".I need to select all rows from that table and remove the rows where the User or the Item in question no l

I have a table called "downloads" with two foreign key columns -- "user_id" and "item_id". I need to select all rows from that table and remove the rows where the User or the Item in question no longer exists. (Look up th开发者_Go百科e User and if it's not found, delete the row in "downloads", then look up the Item and if it's not found, delete the row in "downloads").

It's 3.4 million rows, so all my scripted solutions have been taking 6+ hours. I'm hoping there's a faster, SQL-only way to do this?


use two anti joins and or them together:

delete from your_table
where user_id not in (select id from users_table)
or item_id not in (select id from items_table)

once that's done, consider adding two foreign keys, each with an on delete cascade clause. it'll do this for you automatically.


delete from your_table where user_id not in (select id from users_table) or item_id not in (select id from items_table)


think there is no faster solution when there are so many rows that are on your server 157 rows per second

check user id if mysql num rows = 0 than delete the downloads and also check the item_id

there was also a similar question about the performance of myswl num rows

MySQL: Fastest way to count number of rows

edit: think the best is to creatse some triggers so the database server does the job for you

currently i would use a cronjob for the first time


For future reference. For these kind of long operations. It is possible to optimise the server independently of the SQL. For example detach the sql service, defrag the system disk, if you can ensure the sql log files are on separate disk drive to the drive where database is. This will at least reduce the pain of these kind of long operations.


I've found in SQL 2008 R2, if your "in" clause contains a null value (perhaps from a table who has a reference to this key that is nullable), no records will be returned! To correct, just add a clause to your selects in the union part:

delete from SomeTable where Key not in (
  select SomeTableKey from TableB where SomeTableKey is not null
  union
  select SomeTableKey from TableC where SomeTableKey is not null
)
0

精彩评论

暂无评论...
验证码 换一张
取 消