I have a big table on mysql (innodb) which contains products assets (13 millions of rows). Here a little schema of my database :
product <-many2one-- file_item --one2many--> family --many2one--> download_type
The *file_item* table is the big table with millions of rows. I try to count products by download types with the following sql query :
select t.name as type,
count(p.product_id) as n
from file_item p
inner join family f on f.id = p.family_id
inner join type t on f.id_type = t.id
group by t.id order by t.name;
There are 3 indexes on *file_item* table:
- 开发者_JAVA百科
- product_family_idx (product_id, family_id)
- family_idx (family_id)
- product_idx (product_id) Explain output :
+----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+ | 1 | SIMPLE | p | ALL | FAMILY_IDX,PRODUCT_FAMILY_IDX | NULL | NULL | NULL | 13862870 | Using temporary; Using filesort | | 1 | SIMPLE | f | eq_ref | PRIMARY,TYPE_ID | PRIMARY | 4 | MEDIA.p.FAMILY_IDX| 1 | | | 1 | SIMPLE | t | eq_ref | PRIMARY | PRIMARY | 4 | MEDIA.f.TYPE_ID | 1 | | +----+-------------+-------+--------+-----------------------------------+---------+---------+-------------------+----------+---------------------------------+
The query takes more than 1 hour to return the results. Please how I can optimize the query ?!
Here is your original query:
select t.name as type,
count(p.product_id) as n
from file_item p
inner join family f on f.id = p.family_id
inner join type t on f.id_type = t.id
group by t.id order by t.name;
You will need to make two major changes:
MAJOR CHANGE # 1 : Refactor the Query
SELECT A.ProductCount,B.name type
FROM
(
SELECT id_type id,COUNT(1) ProductCount
FROM
(
SELECT p.id_type
FROM (SELECT family_id,id_type FROM file_item) p
INNER JOIN (SELECT id FROM family) f on f.id = p.family_id
) AA
GROUP BY id_type
) A
INNER JOIN type B USING (id)
ORDER BY B.name;
MAJOR CHANGE # 2 : Create Indexes That Will Support the Refactored Query
ALTER TABLE file_item ADD INDEX family_type_idx (family_id,id_type);
Give it a Try !!!
Lets decompose the query into parts:
- First, fetch each row of file_item => 13M rows
- For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
- For each returned row, fetch a row of type matching f.id_type = t.id. => 13M fetches, 13M rows
- Group by type.id => 10 rows
- Sort by type.name => 10 rows to sort
As you can see, your query needs fetch 13M rows from family and 13M rows from type.
You should start be reducing the number of row fetches needed to execute the query:
Assuming that f.id_type
is a non-NULL foreign key, you can change the inner join type t
to a left join type t
. Then, change group by t.id
to group by f.id_type
.
Grouping on the f
table instead of the t
table and changing the inner join to a left join allows MySQL to execute the group by
before fetching rows from t
.
group by
drastically reduces the number of rows, so this drastically reduce the number of fetches from t
too:
- First, fetch each row of file_item => 13M rows
- For each returned row, fetch a row of family matching f.id = p.family_id. => 13M fetches, 13M rows
- Group by type.id => 10 rows
- For each returned row, fetch a row of type matching f.id_type = t.id. => 10 fetches, 10 rows
- Sort by type.name => 10 rows to sort
The result is that the query already fetches 13M less rows.
You can reduce that even more by denormalizing the schema a little:
If you add a family_type_id column in file_item, you could rewrite your query like this:
SELECT count(1)
FROM file_item p
JOIN type t ON t.id = p.family_type_id
GROUP BY p.family_type_id
ORDER BY t.name
With an index on file_item.family_type_id, this query should execute in milliseconds.
精彩评论