开发者

Delete large portion of huge tables

开发者 https://www.devze.com 2023-01-04 22:53 出处:网络
I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database 开发者_StackOverflow中文版software is MS SQL 2005. T

I have a very large table (more than 300 millions records) that will need to be cleaned up. Roughly 80% of it will need to be deleted. The database 开发者_StackOverflow中文版software is MS SQL 2005. There are several indexes and statistics on the table but not external relationships.

The best solution I came up with, so far, is to put the database into "simple" recovery mode, copy all the records I want to keep to a temporary table, truncate the original table, set identity insert to on and copy back the data from the temp table.

It works but it's still taking several hours to complete. Is there a faster way to do this ?


As per the comments my suggestion would be to simply dispense with the copy back step and promote the table containing records to be kept to become the new main table by renaming it.

It should be quite straightforward to script out the index/statistics creation to be applied to the new table before it gets swapped in.

The clustered index should be created before the non clustered indexes.

A couple of points I'm not sure about though.

  1. Whether it would be quicker to insert into a heap then create the clustered index afterwards. (I guess no if the insert can be done in clustered index order)
  2. Whether the original table should be truncated before being dropped (I guess yes)


@uriDium -- Chunking using batches of 50,000 will escalate to a table lock, unless you have disabled lock escalation via alter table (sql2k8) or other various locking tricks.


I am not sure what the structure of your data is. When does a row become eligible for deletion? If it is a purely ID based on date based thing then you can create a new table for each day, insert your new data into the new tables and when it comes to cleaning simply drop the required tables. Then for any selects construct a view over all the tables. Just an idea.

EDIT: (In response to comments) If you are maintaining a view over all the tables then no it won't be complicated at all. The complex part is coding the dropping and recreating of the view.

I am assuming that you don't want you data to be locked down too much during deletes. Why not chunk the delete operations. Created a SP that will delete the data in chunks, 50 000 rows at a time. This should make sure that SQL Server keeps a row lock instead of a table lock. Use the

WAITFOR DELAY 'x'

In your while loop so that you can give other queries a bit of breathing room. Your problem is the old age computer science, space vs time.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号