for example a website where users can vote on questions (up and down) with this model:
users(id)
questions(id)
votes(id,userId,questionid,vote)
-- vote can be +1 or -1, or probably better bit 0 and 1
I'm thinking that when there are going to be lots of questions
with lots of votes, there are going开发者_StackOverflow中文版 to be performance issues, especially when showing the list of questions, so
would it make sense to add a column to the questions like this:
questions(id, votessum)
and each time somebody votes besides doing an insert into the votes questions also to update the questions
and set it's votessum column
I would say that no, out of the gate I would not have such a column on the question table. I would determine via actual or estiamted usage plus performance testing whether the queries required to aggregate this data would become a performance bottleneck.
If the query did create a bottleneck, I would first look at other optimizations, like improving indexes and materialized views before doing this kind of denormalization.
Denormalization is sometime useful (and sometimes necessary), but it has its own problems and pitfalls that make fully exploring other options worthwhile.
I would worry more about accuracy and capturing all data first before I worry about performance.
I would create a separate table called QuestionVotes with columns as QuestionId, UserId, VoteType, TimeStamp
to capture all related activity.
This would help to track all votes, their type, who cast the vote, to prevent double votes, etc.
Now I may consider updating the count in the Questions table if the performance degrades, but I would go the route of indexing and query tuning first.
精彩评论