I have a very large database (~100Gb) primarily consisting of two tables I want to reduce in size (both of which have approx. 50 million records). I have an archive DB set up on the same server with these two tables, using the same schema. I'm trying to determine the best conceptual way of going about removing the rows from the live db and inserting them in the archive DB. In pseudocode this is what I'm doing now:
Declare @NextIDs Table(UniqueID)
Declare @twoYearsAgo = two years from today's date
Insert into @NextIDs
SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo
Insert into myArchiveTable
<fields>
SELECT <fields>
FROM myLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID
DELETE MyLargeTable
FROM MyLargeTable INNER JOIN @NextIDs on myLargeTable.UniqueID = @NextIDs.UniqueID
Right now this takes a horrifically slow 7 minutes to complete 1000 records. I've tested the Delete and the Insert, both taking approx. 3.5 minutes to complete, so its not necessarily one is drastically m开发者_运维问答ore inefficient than the other. Can anyone point out some optimization ideas in this?
Thanks!
This is SQL Server 2000.
Edit: On the large table there is a clustered index on the ActionDate field. There are two other indexes, but neither are referenced in any of the queries. The Archive table has no indexes. On my test server, this is the only query hitting the SQL Server, so it should have plenty of processing power.
Code (this does a loop in batches of 1000 records at a time):
DECLARE @NextIDs TABLE(UniqueID int primary key)
DECLARE @TwoYearsAgo datetime
SELECT @TwoYearsAgo = DATEADD(d, (-2 * 365), GetDate())
WHILE EXISTS(SELECT TOP 1 UserName FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [ActionDateTime] < @TwoYearsAgo)
BEGIN
BEGIN TRAN
--get all records to be archived
INSERT INTO @NextIDs(UniqueID)
SELECT TOP 1000 UniqueID FROM [ISAdminDB].[dbo].[UserUnitAudit] WHERE [UserUnitAudit].[ActionDateTime] < @TwoYearsAgo
--insert into archive table
INSERT INTO [ISArchive].[dbo].[userunitaudit]
(<Fields>)
SELECT <Fields>
FROM [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID
--remove from Admin DB
DELETE [ISAdminDB].[dbo].[UserUnitAudit]
FROM [ISAdminDB].[dbo].[UserUnitAudit] AS a
INNER JOIN @NextIDs AS b ON a.UniqueID = b.UniqueID
DELETE FROM @NextIDs
COMMIT
END
You effectively have three selects which need to be run before your insert/delete commands are executed:
for the 1st insert:
SELECT top 100 from myLargeTable Where myLargeTable.actionDate < twoYearsAgo
for the 2nd insert:
SELECT <fields> FROM myLargeTable INNER JOIN NextIDs
on myLargeTable.UniqueID = NextIDs.UniqueID
for the delete:
(select *)
FROM MyLargeTable INNER JOIN NextIDs on myLargeTable.UniqueID = NextIDs.UniqueID
I'd try and optimize these and if they are all quick, then the indexes may be slowing down your writes. Some suggestions:
start profiler and see what's happenng with the reads/writes etc.
check index usage for all three statements.
try running the
SELECTs
returning only the PK, to see if the delay is query execution or fetching the data (do have e.g. any fulltext-indexed fields,TEXT
fields etc.)
Do you have an index on the source table for the column which you're using to filter the results? In this case, that would be the actionDate.
Also, it can often help to remove all indexes from the destination table before doing massive inserts, but in this case you're only doing 100's at a time.
You would also probably be better off doing this in larger batches. With one hundred at a time the overhead of the queries is going to end up dominating the costs/time.
Is there any other activity on the server during this time? Is there any blocking happening?
Hopefully this gives you a starting point.
If you can provide the exact code that you're using (maybe without the column names if there are privacy issues) then maybe someone can spot other ways to optimize.
EDIT: Have you checked the query plan for your block of code? I've run into issues with table variables like this where the query optimizer couldn't figure out that the table variable would be small in size so it always tried to do a full table scan on the base table.
In my case it eventually became a moot point, so I'm not sure what the ultimate solution is. You can certainly add a condition on the actionDate to all of your select queries, which would at least minimize the effects of this.
The other option would be to use a normal table to hold the IDs.
The INSERT and DELETE statements are joining on
[ISAdminDB].[dbo].[UserUnitAudit].UniqueID
If there's no index on this, and you indicate there isn't, you're doing two table scans. That's likely the source of the slowness, b/c a SQL Server table scan reads the entire table into a scratch table, searches the scratch table for matching rows, then drops the scratch table.
I think you need to add an index on UniqueID
. The performance hit for maintaining it has got to be less than table scans. And you can drop it after your archive is done.
Are there any indexes on myLargeTable.actionDate and .UniqueID?
Have you tried larger batch sizes than 100?
What is taking the most time? The INSERT, or the DELETE?
You might try doing this using the output clause:
declare @items table (
<field list just like source table> )
delete top 100 source_table
output deleted.first_field, deleted.second_field, etc
into @items
where <conditions>
insert archive_table (<fields>)
select (<fields>) from @items
You also might be able to do this in a single query, by doing 'output into' directly into the archive table (eliminating the need for the table var)
精彩评论