开发者

MySQL query optimization. Avoiding temporary & filesort

开发者 https://www.devze.com 2023-03-28 06:33 出处:网络
Currently I have a table with close to 1 million rows, which I need to query from.What I need to be able to do is stack rank packages on the number of products they include from a given list of produc

Currently I have a table with close to 1 million rows, which I need to query from. What I need to be able to do is stack rank packages on the number of products they include from a given list of product id's.

SELECT count(productID) AS commonProducts, packageID
FROM supply
WHERE productID IN (2,3,4,5,6,7,8,9,10) 
GROUP BY packageID 
ORDER BY commonProducts 
DESC LIMIT 10

The query works fine, but I would like to improve upon it. I tried a mul开发者_运维百科ti-column index on productID and packageID, but it seemed to seek more rows than just having a separate index for each of the columns.

MySQL Explain

select_type: SIMPLE
table: supply
type: range
possible_keys: supplyID
key: supplyID
key_len: 3
ref: null
rows: 996
extra: Using where; Using temporary; Using filesort

My main concern is that the query is using a temporary table and filesort. How could I go about optimizing this query? I presume that the biggest issues is count() and the ORDER BY on the results of count().


You can remove the temp table using a Dependent Subquery:

select * from 
  (
   SELECT count(productID) AS commonProducts, s.productId, s.packageID 
   FROM supply as s
   WHERE EXISTS
   (
      select 1 from supply as innerS 
        where innerS.productID in (2,3,4,5,6,7,8,9,10) 
          and s.productId = innerS.productId 
   )
   GROUP BY s.packageID
  ) AS t
ORDER BY t.commonProducts 
DESC LIMIT 10

The inner query links to the outer query and preserves the index. You'll find that any query that sorts on commonProducts, including the above query, will use a filesort, as count(*) is definitely not indexed. But fear not, filesort is just a fancy word for sort -- mysql can choose to use an effective in-memory sort -- and whether you did it now or as a mergesort on the way to an indexed temporary table, you'll have to pay for that sorting somewhere. However, this case is pretty good because filesort will stop sorting once it hits the LIMIT you've put in place. It will not sort the entire list of commonProducts.

Update

If this query is going to be run all the time, I would recommend (without getting too fancy) setting triggers on the supply table to update a legitimate table that tracks counters like this one.


Creatng a temporary resulte set:

SELECT  TMP.*
FROM (  SELECT count(productID) AS commonProducts, packageID
        FROM supply
        WHERE productID IN (2,3,4,5,6,7,8,9,10)
        GROUP BY packageID 
) AS TMP 

ORDER BY commonProducts 
DESC LIMIT 10


Perhaps it's not the most elegant way and I cannot guarantee it will be faster because everything depends on your particular data. But in some cases this gives much better results:

SELECT count(*) AS commonProducts, packageID
FROM (
    SELECT packageID FROM supply WHERE productID = 2
    UNION ALL
    SELECT packageID FROM supply WHERE productID = 3
    UNION ALL
    .
    .
    .
    SELECT packageID FROM supply WHERE productID = 10
) AS t
GROUP BY packageID
ORDER BY commonProducts DESC
LIMIT 10
0

精彩评论

暂无评论...
验证码 换一张
取 消