I have a large table foo_large and a relatively small (a few hundred thousand rows) table foo_small. The large table has a primary-key column "id"; the small one also has "id" column that is a foreign key to foo_large. I want to update foo_small so that for every row its col_y has a value equal to col_x in the corresponding row of foo_large. The most straightforward way seems to be like this:
update foo_small sm
set col_y = (
select col_x
from foo_large
where id = sm.id);
This is, however, very inefficient. For each row of foo_small, the corresponding row of foo_large is accessed by the index on its primary key. Though foo_small is small compared to foo_large, it still leads to several hundred thousand of index scans on that table. A better solution would be to hash foo_small in memory and perform a (possibly parallelized) full scan on foo_large once, updating the matching rows of foo_small encountered. I can do that in the following way:
update
(
select /*+ ordered use_hash(lg) parallel(lg 2) */
sm.*, lg.col_x
from
foo_small sm,
foo_large lg
where sm.id = lg.id
)
set col_y = col_x;
This query finishes within a minute. Unfortunately, it has another disadvantage: It requires that the user launching this query has a privilege to update not only foo_small, but also foo_large even though the latter table is not actually updated. Is there a solution to force the latter execution plan without updating a join? I know I could write a piece of procedural PL/SQL code with bulk fetch/update and probably keep most of the performance gain, but I suppose there must be a wa开发者_如何学Cy to do it in a single query.
Thanks in advance.
Here follows the ultimate query, suggested by Shannon, that does the job:
merge /*+ leading(sm) full(lg) use_hash(lg) parallel(lg 2) */
into foo_small sm
using foo_large lg
on (lg.id = sm.id)
when matched then
update set sm.col_y = lg.col_x
Is it necessary to update every row in foo_small everytime you run the query - is the data changing that frequently ? What drives the changes to col_x in foo_large - could you have an update indexed flag column or timestamp so you only have to update rows that have actually changed ?
update foo_small sm
set col_y = (
select col_x
from foo_large
where id = sm.id)
where last_updated>=TRUNC(SYSDATE)-1;
精彩评论