The following query takes FOREVER to execute (30+ hrs on a Macbook w/4gig ram) - I'm looking for ways to make it run more efficiently. Any thoughts are appreciated!
CREATE TABLE fc AS
SELECT threadid,
title,
body,
date,
userlogin
FROM f
WHERE pid
NOT IN (SELECT pid FROM ft) O开发者_如何学GoRDER BY date;
(table "f" is ~1 Gig / 1,843,000 row, table "ft" is 168mb, 216,000 rows) )
Try an outer join (I think MySQL supports them now) instead of a not in:
create table fc as
select f.threadid
, f.title
, f.body
, f.date
, f.userlogin
from f
left outer join ft
on f.pid = ft.pid
where ft.pid is null
order by date
Start with EXPLAIN PLAN to see what the optimizer says. Then re-run it when you make changes to see if they help.
I'll bet the right query will run in minutes.
Add an clustered index on pid on both fc and ft tables.
Make sure you have a pid index on ft. It sounds like you are getting the full cross product instead of a join by index.
There can be some hidden costs. How long does it take to run this:
SELECT count(*)
FROM f
WHERE pid
NOT IN (SELECT pid FROM ft);
If it doesn't take long, then your command's slowness may be MySQL duplicating all the data as the statement executes just in case it fails and has to roll it back. (I've seen this with SQL Server.)
Also: is it any different if you take out the ORDER BY clause?
How many rows in f
won't match a row in ft
? In the most extreme case, if pid
is unique in f
your target table fc
will contain >1.6m rows. If the bulk of the rows will end up in fc
you would be better off doing this in two stages:
CREATE TABLE fc AS
SELECT threadid,
title,
body,
date,
userlogin
FROM f
ORDER BY date;
DELETE FROM fc
WHERE pid
IN (SELECT pid FROM ft);
Incidentally, can you ditch the ORDER BY clause? That sort could cost a lot of cycles, again depending on how many rows there are in the target table.
Another thing to consider is the EXISTS clause...
CREATE TABLE fc AS
SELECT threadid,
title,
body,
date,
userlogin
FROM f
WHERE NOT EXISTS
(SELECT pid FROM ft
WHERE ft.pid = f.id)
ORDER BY date;
... or in my two-step version ...
DELETE FROM fc
WHERE EXISTS
(SELECT pid FROM ft
WHERE ft.pid = f.id);
EXISTS can be a lot faster than IN when the sub-query generates a lot of rows. However, as is always the case with tuning, benchmarking is key.
精彩评论