开发者

SQL query speed with two inner joins and concat

开发者 https://www.devze.com 2023-02-17 02:37 出处:网络
I have three tables: books; authors; list_items (contains bestseller lists from New York Times) I use this query to get an author whose books stayed in bestseller lists for maximum number of week

I have three tables:

  • books;
  • authors;
  • list_items (contains bestseller lists from New York Times)

I use this query to get an author whose books stayed in bestseller lists for maximum number of weeks:

SELECT authors.full_name, COUNT(*) FROM authors
 INNER JOIN books ON books.author LIKE CONCAT('%', authors.full_name, '%')
 INNER JOIN list_items ON list_items.book_title = books.title
 GROUP BY authors.full_name ORDER BY count(*) DESC LIMIT 1

This query takes around 6 minutes, while similar query without the second JOIN takes much less than a second. How to optimize my query?

UPDATE EXPLAIN is this:

table      type   possible_keys key      key_len ref         rows   Extra
authors    ALL    <NULL>        <NULL>   <NULL>  <NULL>      2555   Using temporary; Using filesort
list_items ALL    book_name     <NULL>   <NULL>  <NULL>   31040  Using join buffer
books      eq_ref PRIMARY      PRIMARY  767     list_items开发者_开发问答. 1      Using where
                                                book_title


I don't think you need to use an extra subselect statement. I took your faster statement and removed the inner select statement. Not sure how much this would affect execution time, but its a little bit cleaner, imho.

SELECT COUNT(*), authors.full_name
FROM list_items
INNER JOIN books ON books.title = list_items.title
INNER JOIN relations ON books.id = relations.book_id
INNER JOIN authors ON authors_id = relations.author_id
GROUP BY authors.full_name
ORDER BY COUNT(*);


Eventually, I solved this problem by restructuring my database scheme.

Root problem was in the field books.author that could contain several author names, hence CONCAT in the initial query. I added a new table to apply many to many relationship, where authors.id and books.id were linked.

Then I used this query instead:

SELECT COUNT (*), items.full_name
    FROM list_items
         INNER JOIN
         (SELECT books.title, authors.full_name
            FROM books INNER JOIN relations ON books.ID = relations.book_id
                 INNER JOIN authors ON authors.ID = relations.author_id
                 ) items ON items.title = list_items.book_title
GROUP BY items.full_name
ORDER BY COUNT (*) DESC;

Execution time dropped to 0.4 seconds.

0

精彩评论

暂无评论...
验证码 换一张
取 消