开发者

Strange Performance Issues with INNER JOIN vs. LEFT JOIN

开发者 https://www.devze.com 2023-01-29 09:07 出处:网络
I was using a query that looked similar to this one: SELECT `episodes`.*, IFNULL(SUM(`views_sum`.`clicks`), 0) as `clicks`

I was using a query that looked similar to this one:

SELECT `episodes`.*, IFNULL(SUM(`views_sum`.`clicks`), 0) as `clicks`
FROM `episodes`, `views_sum`
WHERE `views_sum`.`index` = "episode" AND `views_sum`.`key` = `episodes`.`id`
GROUP BY `episodes`.`id`

... which takes ~0.1s to execute. But it's problematic, because some episodes don't have a corresponding views_sum row, so those episodes aren't included in the result.

What I want is NULL values when a corresponding views_sum row doesn't exist, so I tried using a LEFT JOIN instead:

SELECT `episodes`.*, IFNULL(SUM(`views_sum`.`clicks`), 0) as `clicks`
FROM `episodes`
LEFT JOIN `views_sum` ON (`views_sum`.`index` = "episode" AND `views_sum`.`key` = `episodes`.`id`)
GROUP BY `episodes`.`id`

This query produces the same columns, and it also includes the few rows missing from the 1st query.

BUT, the 2nd query takes 10 times as开发者_如何学JAVA long! A full second.

Why is there such a huge discrepancy between the execution times when the result is so similar? There's nowhere near 10 times as many rows — it's like 60 from the 1st query, and 70 from the 2nd. That's not to mention that the 10 additional rows have no views to sum!

Any light shed would be greatly appreciated!

(There are indexes on episodes.id, views_sum.index, and views_sum.key.)

EDIT:

I copy-pasted the SQL from above, and here are the EXPLAINs, in order:

id  select_type table       type    possible_keys   key         key_len     ref                         rows    Extra
1   SIMPLE      views_sum   ref     index,key       index       27          const                       6532    Using where; Using temporary; Using filesort
1   SIMPLE      episodes    eq_ref  PRIMARY         PRIMARY     4           db102914_itw.views_sum.key  1       Using where

id  select_type table       type    possible_keys   key         key_len     ref                         rows    Extra
1   SIMPLE      episodes    ALL     NULL            NULL        NULL        NULL                        70      Using temporary; Using filesort
1   SIMPLE      views_sum   ref     index,key       index       27          const                       6532 


Here's the query I ultimately came up with, after many, many iterations. (The SQL_NO_CACHE flag is there so I can test execution times.)

SELECT SQL_NO_CACHE e.*, IFNULL(SUM(vs.`clicks`), 0) as `clicks`

FROM `episodes` e

LEFT JOIN
(SELECT * FROM `views_sum` WHERE `index` = "episode") vs
ON vs.`key` = e.`id`

GROUP BY e.`id`

Because the ON condtion views_sum.index = "episode" is static, i.e., isn't dependent on the row it's joined to, I was able to get a massive performance boost by first using a subquery to limit the views_sum table before joining.

My query now takes ~0.2s. And what's even better, the time doesn't grow as you increase the offset of the query (unlike my first LEFT JOIN attempt). It stays the same, even if you do a sort on the clicks column.


You should have a combined index on views_sum.index and views_sum.key. I suspect you will always use both fields together if i look at the names. Also, I would rewrite the first query to use a proper INNER JOIN clause instead of a filtered cartesian product.

I suspect the performance of both queries will be much closer together if you do this. And, more importantly, much faster than they are now.

edit: Thinking about it, I would probably add a third column to that index: views_sum.clicks, which probably can be used for the SUM. But remember that multi-column indexes can only be used left to right.


It's all about the indexes. You'll have to play around with it a bit or post your database schema on here. Just as a rough guess i'd say you should make sure you have an index on views_sum.key.


Normally, a LEFT JOIN will be slower than an INNER JOIN or a CROSS JOIN because it has to view the first table differently. Put another way, the difference in time isn't related to the size of the result, but the full size of the left table.

I also wonder if you're asking MySQL to figure things out for you that you should be doing yourself. Specifically, that SUM() function would normally require a GROUP BY clause.

0

精彩评论

暂无评论...
验证码 换一张
取 消