开发者

Please help optimizing a long running query (left outer join, with 2 derived tables)

开发者 https://www.devze.com 2023-02-02 10:03 出处:网络
The query I need help with is: SELECT d.bn, d.4700, d.4500, ... , p.`Activity Description` FROM ( SELECT temp.bn, temp.4700, temp.4500, ....

The query I need help with is:

SELECT d.bn, d.4700, d.4500, ... , p.`Activity Description`  
FROM   
( SELECT temp.bn, temp.4700, temp.4500, ....  
FROM `tdata` temp  
GROUP BY temp.bn  
HAVING (COUNT(temp.bn) = 1) ) d   
LEFT OUTER JOIN  
( SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.`Activity Description`
FROM `pdata` temp2
GROUP BY temp2.bn ) p  
ON p.bn = d.bn;

The ... represents other fields that aren't really important to solving this problem.

The issue is on the the second derived table - it is not using the index I have created and I am not sure why, it seems to be because of the way TEXT fields are handled. The first subquery uses the index I have created and runs quite snappy, however an EXPLAIN on the second shows a 'Using temporary; Using filesort'. Please see the indexes I have created in the below table create statements. Can anyone help me optimize this?

By way of quick explanation the first subquery is meant to only select records that have unique bn's, the second, while it looks a bit wacky (with the max function there which is not being used in the result set) is making sure that only one record from the right part of the join is included in the result set.

My table create statements are

CREATE TABLE `tdata` (
`BN` varchar(15) DEFAULT NULL,
`4000` varchar(3) DEFAULT NULL,
`5800` varchar(3) DEFAULT NULL,
....
KEY `BN` (`BN`),
KEY `idx_t3010`(`BN`,`4700`,`4500`,`4510`,`4520`,`4530`,`4570`,`4950`,`5000`,`5010`,`5020`,`5050`,`5060`,`5070`,`5100`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8

CREATE TABLE `pdata` (
`BN` varchar(15) DEFAULT NULL,
`FPE` datetime DEFAULT NULL,
`Activity Description` text,
....
KEY `BN` (`BN`),
KEY `idx_programs_2009` (`BN`,`FPE`,`Activity Description`(100))
) ENGINE=MyISAM DEFAULT CHARSET=utf8

Thanks!

EDIT

Thanks to Malvolio and Brian Hooper for their comments. Malvolio's suggestion won't work for me as in both tables there are records with identical bn's and no common field that is unique to these records.

It really boils down to the the second derived table query:

SELECT temp2.bn, max(temp2.FPE) AS max_fpe, temp2.Activity Description 
FROM pdata temp2 
GROUP BY temp2.bn;

No matter what I do here to create an index on the Activity Description TEXT field, the query will not use it according to the EXPLAIN. If it would use an index, I'm sure this query would run great (as the first derived table query runs very speedy). Alternatively, if there was a better way of structuring this query to ensure there was 开发者_C百科only one record per bn, that would work too.

Thanks.


Subselects are usually the quickest way to a slow query. I'm not sure exactly what you are trying to do, but you could select the BN from pdata with the FPE with the following query

SELECT p.* FROM pdata p 
LEFT JOIN pdata p0 ON p.BN = p0.BN AND p.FPE < p0.FPE 
WHERE p0.BN IS NULL

Similarly, if you had some column in tdata that was unique (or unique among rows with the same BN)

SELECT t.* FROM tdata t 
LEFT JOIN tdata t0 ON t.BN = t0.BN AND t.SOMEUNIQUEKEY != t0.SOMEUNIQUEKEY 
WHERE t0.BN IS NULL

Something quirky about subselects: they are always much slower than an equivalent join. I think it's a bug.

EDIT

Hooper, the ichthyologist, was unclear on how the LEFT JOIN pdata p0 ON p.BN = p0.BN ... WHERE p0.BN IS NULL stuff worked. Let me go through it step by step with a much simpler example. You have a table names, with a last name and a first name and you want to find the unique last names (that is, every last name held by only one person. The data are as follows:

last first
Smith Will
Smith John
Smith Adam
Jones John

First try the left-join by itself

SELECT n1.last, n1.first, n2.last, n2.first FROM names n1 
  LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first

that will return

last first last first
Smith Will Smith John 
Smith John Smith Will
Smith Adam Smith John 
Smith Will Smith Adam 
Smith John Smith Adam 
Smith Adam Smith Will
Jones John NULL  NULL

Notice those nulls in the last row? That was no boating accident, that is the difference between an ordinary inner join and a left join. An inner join (find all pairs of rows with the same last name and a different first name) would have found the first six, but ignore the unpaired seventh. The only function of the LEFT JOIN is to pad out with nulls anything not filled in by the ON clause.

Now we pull only that row:

SELECT n1.last, n1.first, n2.last, n2.first FROM names n1 
  LEFT JOIN names n2 ON n1.last = n2.last and n1.first != n2.first
  WHERE n2.last IS NULL

And (assuming no nulls in the underlying data), we get only the row that the ON clause failed to match.

Ta, and if I may so so, da.

0

精彩评论

暂无评论...
验证码 换一张
取 消