I have three tables:
- books;
- authors;
- list_items (contains bestseller lists from New York Times)
I use this query to get an author whose books stayed in bestseller lists for maximum number of weeks:
SELECT authors.full_name, COUNT(*) FROM authors
INNER JOIN books ON books.author LIKE CONCAT('%', authors.full_name, '%')
INNER JOIN list_items ON list_items.book_title = books.title
GROUP BY authors.full_name ORDER BY count(*) DESC LIMIT 1
This query takes around 6 minutes, while similar query without the second JOIN takes much less than a second. How to optimize my query?
UPDATEEXPLAIN
is this:
table type possible_keys key key_len ref rows Extra
authors ALL <NULL> <NULL> <NULL> <NULL> 2555 Using temporary; Using filesort
list_items ALL book_name <NULL> <NULL> <NULL> 31040 Using join buffer
books eq_ref PRIMARY PRIMARY 767 list_items开发者_开发问答. 1 Using where
book_title
I don't think you need to use an extra subselect statement. I took your faster statement and removed the inner select statement. Not sure how much this would affect execution time, but its a little bit cleaner, imho.
SELECT COUNT(*), authors.full_name
FROM list_items
INNER JOIN books ON books.title = list_items.title
INNER JOIN relations ON books.id = relations.book_id
INNER JOIN authors ON authors_id = relations.author_id
GROUP BY authors.full_name
ORDER BY COUNT(*);
Eventually, I solved this problem by restructuring my database scheme.
Root problem was in the field books.author
that could contain several author names, hence CONCAT in the initial query. I added a new table to apply many to many relationship, where authors.id
and books.id
were linked.
Then I used this query instead:
SELECT COUNT (*), items.full_name
FROM list_items
INNER JOIN
(SELECT books.title, authors.full_name
FROM books INNER JOIN relations ON books.ID = relations.book_id
INNER JOIN authors ON authors.ID = relations.author_id
) items ON items.title = list_items.book_title
GROUP BY items.full_name
ORDER BY COUNT (*) DESC;
Execution time dropped to 0.4 seconds.
精彩评论