开发者

Is SQL Server smart enough to not do IO when it is not really needed?

开发者 https://www.devze.com 2023-04-01 21:10 出处:网络
Say I have a merge Statement that looks like this: merge TableA as target using (select Id, Description, UnitCost

Say I have a merge Statement that looks like this:

merge TableA as target
using (select Id, Description, UnitCost
       from   TableB) 
       as source (Id, Description, UnitCost)
on (target.Id = 开发者_如何学运维source.Id)             
when MATCHED then
    update set Id = source.Id, 
               Description = source.Description, 
               UnitCost = Source.UnitCost
when NOT MATCHED then
    insert (Id, Description, UnitCost)
    values (source.Id, source.Description, source.UnitCost);

When I run this it tells me how many rows were affected. If I run it and I know that the source and the destination are exactly the same I still get a message telling me that x number of rows were affected. In my case it is about 200 rows. Is SQL Server re-writing the same data to disk?

200 rows is nothing and can easily be rewritten with out impacting SQL Server's performance. But if I have a merge statement with a 500,000+ rows and lots of indexes, then re-updating all the data in the table is going to get expensive.

Do I need to be checking that the data has changed first (at least in the cases were performance could be an issue)?

If so, how do I do that in a merge statement (maybe using my example above)?


merge TableA as target
using (select Id, Description, UnitCost
       from   TableB) 
       as source (Id, Description, UnitCost)
on (target.Id = source.Id)             
when MATCHED AND (ID <> source.ID OR Description <> source.Description OR UnitCost <> Source.UnitCost) then
    update set Id = source.Id, 
               Description = source.Description, 
               UnitCost = Source.UnitCost
when NOT MATCHED then
    insert (Id, Description, UnitCost)
    values (source.Id, source.Description, source.UnitCost);

You can add a conditional search clause to the Matched statement, this basically checks to make sure that something actually has changed. Not sure if this will be necessarily faster but it won't update rows that don't need to be updated.

If you need more information check the docs MERGE (T-SQL)


SQL Server, and any buffer-pool write ahead log based engine for the matters, will not do data IO for updates/deletes/inserts. It has always been like this since the ARIES paper was published, and almost all modern relational databases trace their ancestry to System-R and ARIES.

When a row is updated (and that includes the inserting and deletion of the row) a log record is appended into a log buffer in memory describing the change, then the page containing the row in memory is updated. Nothing is written to disk. Execution continues. When the transaction commits a new log record is generated and the commit cannot proceed until all the log in memory, up to and including the log commit record is flushed to disk. This is the only mandatory IO that is required for the update to be allowed to proceed. If you update 500k rows, in one statement then the system will only have to wait for the flush of the log after all 500k rows were updated.

The data in memory is periodically written to disk during checkpoints.

0

精彩评论

暂无评论...
验证码 换一张
取 消