First, I know that the sql statement to update table_a
using values from table_b
is in the form of:
Oracle:
UPDATE table_a
SET (col1, col2) = (SELECT cola, colb
FROM table_b
WHERE table_a.key = table_b.key)
WHERE EXISTS (SELECT *
FROM table_b
WHERE table_a.key = table_b.key)
MySQL:
UPDATE table_a
INNER JOIN table_b ON table_a.key = table_b.key
SET table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
What I understand is the database engine will go through records in table_a
and update them with values from matching records in table_b
.
So, if I have 10 millions records in table_a
and only 10 records in table_b
:
Does that mean that the engine will do 10 millions iterations through
table_a
just to update 10 records? Are Oracle/MySQL/etc smart enough to do only 10 iterations throughtable_b
?Is there a way to force the engine to actually iterate through records in
table_b
instead oftable_a
to do the update? Is there an alternative syntax for the sq开发者_如何学JAVAl statement?
Assume that table_a.key
and table_b.key
are indexed.
Either engine should be smart enough to optimize the query based on the fact that there are only ten rows in table b. How the engine determines what to do is based factors like indexes and statistics.
If the "key" column is the primary key and/or is indexed, the engine will have to do very little work to run this query. It will basically already sort of "know" where the matching rows are, and look them up very quickly. It won't have to "iterate" at all.
If there is no index on the key column, the engine will have to to a "table scan" (roughly the equivalent of "iterate") to find the right values and match them up. This means it will have to scan through 10 million rows.
Do a little reading on what's called an Execution Plan. This is basically an explanation of what work the engine had to do in order to run your query (some databases show it in text only, some have the option of seeing it graphically). Learning how to interpret an Execution Plan will give you great insight into adding indexes to your tables and optimizing your queries.
Look these up if they don't work (it's been a while), but it's something like:
- In MySQL, put the work "EXPLAIN" in front of your SELECT statement
- In Oracle, run "SET AUTOTRACE ON" before you run your SELECT statement
I think the first (Oracle) query would be better written with a JOIN instead of a WHERE EXISTS. The engine may be smart enough to optimize it properly either way. Once you get the hang of interpreting an execution plan, you can run it both ways and see for yourself. :)
Okay I know answering own question is usually frowned upon but I already accepted another answer and won't unaccept it so meh here it is ..
I've discovered a much better alternative that I'd like to share it with anyone who encounters the same scenario: MERGE
statement.
Apparently, newer Oracle versions introduced this MERGE
statement which simply blows! Not only that the performance is so much better in most cases, the syntax is so simple and so make sense that I feel stupid for using the UPDATE
statement! Here comes ..
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb;
And what more is that I can also extend the statement to include INSERT
action when table_a
does not have matching records for some records in table_b
:
MERGE INTO table_a
USING table_b
ON (table_a.key = table_b.key)
WHEN MATCHED THEN UPDATE SET
table_a.col1 = table_b.cola,
table_a.col2 = table_b.colb
WHEN NOT MATCHED THEN INSERT
(key, col1, col2)
VALUES (table_b.key, table_b.cola, table_b.colb);
This new statement type made my day the day I discovered it :)
精彩评论