开发者

Is performing a count() calculation slowing down my mysql query?

开发者 https://www.devze.com 2022-12-16 00:43 出处:网络
I\'m still learning about MySQL. I may be making a very basic error, and I\'m prepared to be chastened here...

I'm still learning about MySQL. I may be making a very basic error, and I'm prepared to be chastened here...

What this query is trying to do is select the top members from our website based on a count of the number of book and recipe reviews they have made.

I'm making a calculation of the total in the SQL query itself. The query is slow (9 seconds) and will definitely not scale considering we only have 400 members and a few thousand reviews so far and it's growing quite quickly.

I presume it's doing a full table scan here, and that the calculation is slowing it down, but I don't know of an alternative way to do this and would love some wisdom.

Here's the SQL statement:

SELECT users.*, COUNT( DISTINCT bookshelf.ID ) AS titles, COUNT( DISTINCT book_reviews.ID ) as bookreviews, COUNT( DISTINCT recipe_reviews.ID ) AS numreviews, COUNT( DISTINCT book_reviews.ID ) + COUNT( DISTINCT recipe_reviews.ID ) as reviewtotal
FROM users
LEFT OUTER JOIN recipe_reviews ON recipe_reviews.user_id = users.ID
LEFT OUTER JOIN book_reviews ON book_reviews.user_id = users.ID
LEFT OUTER JOIN bookshelf ON users.ID = bookshelf.user_id
GROUP BY users.ID
ORDER BY reviewtotal DESC
LIMIT 8

Here is the EXPLANATION:

+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+
| id | select_type | table          | type  | possible_keys     | key               | key_len | ref                 | rows | Extra                           |
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+--开发者_如何学运维-------------------------------+
|  1 | SIMPLE      | users          | index | NULL              | PRIMARY           | 4       | NULL                |  414 | Using temporary; Using filesort | 
|  1 | SIMPLE      | recipe_reviews | ref   | recipe_reviews_fk | recipe_reviews_fk | 5       | users.ID            |   12 |                                 | 
|  1 | SIMPLE      | book_reviews   | ref   | user_id           | user_id           | 5       | users.ID            |    4 |                                 | 
|  1 | SIMPLE      | bookshelf      | ref   | recipe_reviews_fk | recipe_reviews_fk | 5       | users.ID            |   13 |                                 | 
+----+-------------+----------------+-------+-------------------+-------------------+---------+---------------------+------+---------------------------------+

UPDATE & SOLVED:

I realized, and @recursive confirmed, that the query is the root of the problem. I'm getting Cartesian products from this. I rewrote it as a series of subqueries and the final working code is here:

SELECT  *, bookreviews + recipereviews AS totalreviews
FROM (SELECT users.*,
            (SELECT count(*) FROM bookshelf WHERE bookshelf.user_id = users.ID) as titles,
            (SELECT count(*) FROM book_reviews WHERE book_reviews.user_id = users.ID) as bookreviews,
            (SELECT count(*) FROM recipe_reviews WHERE recipe_reviews.user_id = users.ID) as recipereviews
    FROM users) q

This gives me a result in milliseconds. There are also ways to do this with JOINs. See How to add together the results of several subqueries? if you want to follow this up.


for features like that, it is always helpful to work with some kind of caching...

It might already help to create sums for all users on a nightly basis and store those sums with the user. This will help a lot and speed up your search.

You should also cache this request somehow for at least a minute or five since you will execute the same request independently on whos logged in.


You might try seeing if there is an improvement from removing the DISTINCT modifiers. Assuming the DISTINCTed fields are primary keys anyway, this could be causing unnecessary work.


Index all tables on user_id. This could easily speed this query up by orders of magnitude if it hasn't been done yet.


You need to create indexes on user_id (preferably clustered indexes if possible).

Are you sure you have done this? Remember that having a foreign key does not automatically generate an index on that key.

If you are joining 4 B-Trees of 1k rows each, this should surely not take 9s, but a few milliseconds.

The long execution time indicates that you are performing table scans for each user.

I'm pretty convinced this is the correct answer.

Your query is fine except that you are COUNTing your reviews twice, replace the second count with bookreviews and numreviews.


You are trying to accomplish too many things with this query. I see problems with your db / query design. Why do you have a user_id in book_shelf? How about the following table structure

CREATE TABLE users (
id INT NOT NULL AUTO_INCREMENT ,
name VARCHAR( 20 ) NOT NULL ,
PRIMARY KEY ( `id` )
)

CREATE TABLE recipe_reviews (
id INT NOT NULL AUTO_INCREMENT ,
review VARCHAR( 20 ),
user_id INT,
PRIMARY KEY (id),
FOREIGN KEY (user_id) references users(id)
)

CREATE TABLE bookshelf (
id INT NOT NULL AUTO_INCREMENT ,
name VARCHAR( 20 ) NOT NULL ,
PRIMARY KEY ( id )
)

CREATE TABLE book_reviews (
id INT NOT NULL AUTO_INCREMENT ,
review VARCHAR( 20 ),
user_id INT,
bookshelf_id INT,
PRIMARY KEY (id),
FOREIGN KEY (user_id) references users(id),
FOREIGN KEY (bookshelf_id) references bookshelf(id)
)

If you want to aggregate on users, here is your query :

SELECT users.*, COUNT(book_reviews.ID ) as bookreviews, COUNT( recipe_reviews.ID ) AS recipereviews, bookreviews + recipereviews as reviewtotal
    FROM users
    LEFT OUTER JOIN recipe_reviews ON recipe_reviews.user_id = users.ID
    LEFT OUTER JOIN book_reviews ON book_reviews.user_id = users.ID
    GROUP BY users.ID
    ORDER BY reviewtotal DESC

You can also aggregate on both users and books, then including the recipe_reviews doesn't make sense.

PS: you don't need the DISTINCT as you have the keys take care of that.


I often find that creating a smaller temporary table from a larger table will have noticable speed benefits.

So the basic process:

  1. store query (with joins) into temporary table
  2. run count/summary queries on temporary table


Why not just store the number of reviews per user as a column in the users table? Every new review the user does should also require an increment of the value of the their user record review count by one.

For example:

user_id user_name number_of_reviews
1       bob       5
2       jane      10

Bob puts a new review in, and you up his number to 6:

review_id user_id review_text
16        1       "Great!"

user_id user_name number_of_reviews
1       bob       6
2       jane      10

Now you can simply get the top 5 reviewers like this:

SELECT * FROM users ORDER BY number_of_reviews DESC LIMIT 5
0

精彩评论

暂无评论...
验证码 换一张
取 消