Currently I have a table with close to 1 million rows, which I need to query from. What I need to be able to do is stack rank packages on the number of products they include from a given list of product id's.
SELECT count(productID) AS commonProducts, packageID
FROM supply
WHERE productID IN (2,3,4,5,6,7,8,9,10)
GROUP BY packageID
ORDER BY commonProducts
DESC LIMIT 10
The query works fine, but I would like to improve upon it. I tried a mul开发者_运维百科ti-column index on productID and packageID, but it seemed to seek more rows than just having a separate index for each of the columns.
MySQL Explain
select_type: SIMPLE
table: supply
type: range
possible_keys: supplyID
key: supplyID
key_len: 3
ref: null
rows: 996
extra: Using where; Using temporary; Using filesort
My main concern is that the query is using a temporary table and filesort. How could I go about optimizing this query? I presume that the biggest issues is count() and the ORDER BY on the results of count().
You can remove the temp table using a Dependent Subquery:
select * from
(
SELECT count(productID) AS commonProducts, s.productId, s.packageID
FROM supply as s
WHERE EXISTS
(
select 1 from supply as innerS
where innerS.productID in (2,3,4,5,6,7,8,9,10)
and s.productId = innerS.productId
)
GROUP BY s.packageID
) AS t
ORDER BY t.commonProducts
DESC LIMIT 10
The inner query links to the outer query and preserves the index. You'll find that any query that sorts on commonProducts, including the above query, will use a filesort, as count(*)
is definitely not indexed. But fear not, filesort is just a fancy word for sort -- mysql can choose to use an effective in-memory sort -- and whether you did it now or as a mergesort on the way to an indexed temporary table, you'll have to pay for that sorting somewhere. However, this case is pretty good because filesort will stop sorting once it hits the LIMIT
you've put in place. It will not sort the entire list of commonProducts.
Update
If this query is going to be run all the time, I would recommend (without getting too fancy) setting triggers on the supply table to update a legitimate table that tracks counters like this one.
Creatng a temporary resulte set:
SELECT TMP.*
FROM ( SELECT count(productID) AS commonProducts, packageID
FROM supply
WHERE productID IN (2,3,4,5,6,7,8,9,10)
GROUP BY packageID
) AS TMP
ORDER BY commonProducts
DESC LIMIT 10
Perhaps it's not the most elegant way and I cannot guarantee it will be faster because everything depends on your particular data. But in some cases this gives much better results:
SELECT count(*) AS commonProducts, packageID
FROM (
SELECT packageID FROM supply WHERE productID = 2
UNION ALL
SELECT packageID FROM supply WHERE productID = 3
UNION ALL
.
.
.
SELECT packageID FROM supply WHERE productID = 10
) AS t
GROUP BY packageID
ORDER BY commonProducts DESC
LIMIT 10
精彩评论