What is the best approach to storing product ratings in a database? I have in mind the following two (simplified, and assuming a MySQL db) scenarios:
Create two columns in the products table to store the number and the sum of all votes respectively. Use the columns to get an average at run time or using a query.
This approach means I only need to access one table开发者_如何学Go, simplifying things.
Normalize the data by creating an additional table to store the ratings.
This isolates the ratings data into a separate table, leaving the products table to furnish data on available products. Although it would require a join or a separate query for ratings.
Which approach is best, normalised or denormalised?
A different table for ratings is highly recommended to keep things dynamic. Don't worry about hundreds (or thousands or tens of thousands) of entries, that's all peanuts for databases.
Suggestion:
table products
- id
- name
- etc
table products_ratings
- id
- productId
- rating
- date (if needed)
- ip (if needed, e.g. to prevent double rating)
- etc
Retrieve all ratings for product 1234
:
SELECT pr.rating
FROM products_ratings pr
INNER JOIN products p
ON pr.productId = p.id
AND p.id = 1234
Average rating for product 1234
:
SELECT AVG(pr.rating) AS rating_average -- or ROUND(AVG(pr.rating))
FROM products_ratings pr
INNER JOIN products p
ON pr.productId = p.id
AND p.id = 1234;
And it's just as easy to get a list of products along with their average rating:
SELECT
p.id, p.name, p.etc,
AVG(pr.rating) AS rating_average
FROM products p
INNER JOIN products_ratings pr
ON pr.productId = p.id
WHERE p.id > 10 AND p.id < 20 -- or whatever
GROUP BY p.id, p.name, p.etc;
I know that my answer is not what you actually ask for, but you might want to have a chance of facilitating that new products with your system can almost never beat the old products. Say that you would get a product with 99% rating. It would be very difficult for new products to get high if you sort by products with the highest rating.
Do not store a record of each rating unless you absolutely need them specifically. An example of such a case could be a psychological experiment that tends to analyze specific properties of the raters themselves. So, yeah! You'd have to be just as crazy to store each rate in a separate record.
Now, coming to the solution, add two more columns to your product table: AverageRating
and RateCount
.
What would you store in them? Well, suppose you have an already-calculated average of the two numbers: 2 and 3, which is 2.5; having a new rate of 10, you'll multiply the average (2.5) by the rate count (2 in this case). Now, you have 5. Add this result to the new rate value (10) and divide the result by 3.
Let's cover all the above in a simple formula,
(AverageRating * RateCount + NewRateValue) / (RateCount + 1)
So (2.5 * 2 + 10) / (2 + 1) = 5
.
Calculate the average on the server-side (not in your database) and store the average in the AverageRating
column and the rate count in the RateCount
column.
Simple, right?!
References
https://math.stackexchange.com/a/106314
精彩评论