开发者

Optimize this MySQL query?

开发者 https://www.devze.com 2022-12-27 04:11 出处:网络
The following query takes FOREVER to execute (30+ hrs on a Macbook w/4gig ram) - I\'m looking for ways to make it run more efficiently. Any thoughts are appreciated!

The following query takes FOREVER to execute (30+ hrs on a Macbook w/4gig ram) - I'm looking for ways to make it run more efficiently. Any thoughts are appreciated!

CREATE TABLE fc AS 
SELECT  threadid,
    title,
    body,
    date,
    userlogin
FROM f 
WHERE pid 
    NOT IN (SELECT pid FROM ft) O开发者_如何学GoRDER BY date;

(table "f" is ~1 Gig / 1,843,000 row, table "ft" is 168mb, 216,000 rows) )


Try an outer join (I think MySQL supports them now) instead of a not in:

create table fc as 
select f.threadid
     , f.title
     , f.body
     , f.date
     , f.userlogin 
from f 
left outer join ft 
  on f.pid = ft.pid 
where ft.pid is null 
order by date


Start with EXPLAIN PLAN to see what the optimizer says. Then re-run it when you make changes to see if they help.

I'll bet the right query will run in minutes.


Add an clustered index on pid on both fc and ft tables.


Make sure you have a pid index on ft. It sounds like you are getting the full cross product instead of a join by index.


There can be some hidden costs. How long does it take to run this:

SELECT  count(*)
FROM f 
WHERE pid 
    NOT IN (SELECT pid FROM ft);

If it doesn't take long, then your command's slowness may be MySQL duplicating all the data as the statement executes just in case it fails and has to roll it back. (I've seen this with SQL Server.)

Also: is it any different if you take out the ORDER BY clause?


How many rows in f won't match a row in ft? In the most extreme case, if pid is unique in f your target table fc will contain >1.6m rows. If the bulk of the rows will end up in fc you would be better off doing this in two stages:

CREATE TABLE fc AS 
SELECT  threadid,
    title,
    body,
    date,
    userlogin
FROM f
ORDER BY date;

DELETE FROM fc
WHERE pid 
     IN (SELECT pid FROM ft);

Incidentally, can you ditch the ORDER BY clause? That sort could cost a lot of cycles, again depending on how many rows there are in the target table.

Another thing to consider is the EXISTS clause...

CREATE TABLE fc AS 
SELECT  threadid,
    title,
    body,
    date,
    userlogin
FROM f 
WHERE NOT EXISTS 
    (SELECT pid FROM ft 
     WHERE ft.pid = f.id)
ORDER BY date;

... or in my two-step version ...

DELETE FROM fc
WHERE EXISTS
     (SELECT pid FROM ft 
 WHERE ft.pid = f.id);

EXISTS can be a lot faster than IN when the sub-query generates a lot of rows. However, as is always the case with tuning, benchmarking is key.

0

精彩评论

暂无评论...
验证码 换一张
取 消