SQL (Server) Update performance_问答_开发者_运维开发者技术经验分享

My limited SQL knowledge prompted this post!

I have a stored procedure that runs a series of updates.

There are 6 update statements, that could very well be mashed into one large update statement, but I dont know what that will mean for performance.

Table basics: 6 tables are hit. 3 of which will never have more than about 5000 records, 3 of which grow indefinitely (c开发者_开发问答urrently around 1-2 million records, properly indexed, etc). Is there a speed advantage to joining all the tables and doing one massive update statement vs keeping it as 6 separate updates?

Im looking for the most efficient way to do it, even if it shaves a second off.

Thanks.

EDIT:

My apologies to all. I am only updating a single table, but verifying data from all 6 to update the one table. (ie. checking to see if table2.somevalue is blank then flag the record in table1 as "error")

If it is possible to combine them, then yes, you should combine them.

If you are updating multiple columns from a single table, then combine them like:

UPDATE mytable
SET foo = bar, fizz = buzz, whiz = bang
WHERE zing > 2081

instead of

UPDATE mytable SET foo = bar   WHERE zing > 2081
UPDATE mytable SET fizz = buzz WHERE zing > 2081
UPDATE mytable SET whiz = bang WHERE zing > 2081

That is nearly 3x as efficient, because it only has to go through the table once.

Updates of fields in different tables will have to be done separately (as Cade Roux mentioned, its not even possible to do them at the same time).

Even if you are using a complex join, only a single table can be updated at a time. Something like:

UPDATE AccountHistory
SET LastPurchaseDate = i.PurchaseDate
FROM Account a
INNER JOIN AccountHistory h ON a.AccountID = h.AccountID
INNER JOIN Invoices i ON i.AccountID = a.AccountID
WHERE i.PurchaseDate > '2009-11-30'

Only AccountHistory is being changed here, even though the source data is accessed through a join.

This isn't even possible (updating multiple tables at the same time in a single UPDATE statement) in T-SQL.

From BOL:

table_name

Is the name of the table to update. The name can be qualified with the linked server, database, and owner name if the table is not in the current server or database, or is not owned by the current user.

and

view_name

Is the name of the view to update. The view referenced by view_name must be updatable. The modifications made by the UPDATE statement cannot affect more than one of the base tables referenced in the FROM clause of the view.

You can't get a good answer with the data supplied - You need to understand on each update if the criteria is indexed and what the combining do to the update complexity.

I think that 6 updates (in a transaction if necessary) are more understandable than 1 big update - so it is a better solution, and you can optimize each update to be as fast as possible.

JRud comment is good - you can always try both way and compare times.

Significant performance improvements can typically be achieved by grouping queries, when the queries come from a [remote] client. In you situation, the multiple queries come from a Stored Procedure, and hence readily are local requests, and the benefits of combining them will not be significant if at all.

Furthermore, upon checking SQL documentation, in the case of UPDATEs, it isn't possible to write a query which modify more than one table at a time...

However, since you appear to be seeking the most efficient way, you'll need to try various possibilities (under a representative set of data inputs) to see the effect of particular changes. In the absence of more detail about your specific situation, it is hard to offer more targeted information.

One small and generic hint:
The mention of "properly indexed" in reference to the "big" tables, is a reminder that when it comes to UPDATEs (and more generally write operations like delete and inserts), indexes can be more of a liability that an asset, performance-wise.

Even if there is, I would not recommend modifying basic table schema to increase performance by a small percentage. If a change would garner a large change in performance, (like double or triple) then in situations where performance is critical it might be worthwhile, however,

With almost all modern database packages, there are numerous other techniques available to you which increase performance without changing basic table schema. You can add indices customized to the types of queries your application uses. Among other techniques, you can add Indexed views, (materialized Views in Oracle). You can vertically and/or horizontally partition tables. In SQL Serve you can "PIN" small tables, so that they remain in memory all the time.

Are you trying to improve the overall performance of the stored procedure or reduce the time the UPDATEs take?

In cases where I'm more worried about concurrency (reducing lock time) than overall performance, I'll SELECT the key values into variables or temporary tables. Then, I use the variables or temporary tables in the UPDATE. The "focus" can significantly improve the UPDATE performance.

Caveat: using this method requires a good understanding of how your system works. The biggest risk with this method is in increasing the chances of deadlocks.