I'm building a voting system whereby each vote is captured in a votes table with the UserID
and the DateTime
of the vote along with an int
of either 1 or -1.
I'm also keeping a running total of TotalVotes
in the table that contains the item that the user actually voted on. This way I'm not constantly running a query to SUM
the Vote table.
My Question is kind of a pros/cons question when it comes to updating the TotalVotes field. With regards to code manag开发者_StackOverflow社区eability, putting an additional update method in the application makes it easy to troubleshoot and find any potential problems. But if this application grows significantly in it's user base, this could potentially cause a lot of additional SQL calls from the app to the DB. Using a trigger keeps it "all in the sql family" so to speak, and should add a little performance boost, as well as keep a mundane activity out of the code base.
I understand that premature optimization could be called in this specific question, but since I haven't built it yet, I might as well try and figure out the better approach right out of the gate.
Personally I'm leaning towards a trigger. Please give me your thoughts/reasoning.
Another option is to create view on the votes table aggregating the votes as TotalVotes. Then index the view.
The magic of SQL Server optimizer (enterprise edition only i think) is that when it sees queries of sum(voteColumn) it will pick that value up from the index on the view of the same data, which is amazing when you consider you're not referencing the view directly in your query!
If you don't have the enterprise edition you could query for the total votes on the view rather than the table, and then take advantage of the index.
Indexes are essentially denormalization of your data that the optimizer is aware of. You create or drop them as required, and let the optimizer figure it out (no code changes required) Once you start down the path of your own hand crafted denormalization you'll have that baked into your code for years to come.
Check out Improving performance with indexed views
There are some specific criteria that must be met to get indexed views working. Here's a sample based on a guess of your data model:
create database indexdemo
go
create table votes(id int identity primary key, ItemToVoteOn int, vote int not null)
go
CREATE VIEW dbo.VoteCount WITH SCHEMABINDING AS
select ItemToVoteOn, SUM(vote) as TotalVotes, COUNT_BIG(*) as CountOfVotes from dbo.votes group by ItemToVoteOn
go
CREATE UNIQUE CLUSTERED INDEX VoteCount_IndexedView ON dbo.VoteCount(itemtovoteon)
go
insert into votes values(1,1)
insert into votes values(1,1)
insert into votes values(2,1)
insert into votes values(2,1)
insert into votes values(2,1)
go
select ItemToVoteOn, SUM(vote) as TotalVotes from dbo.votes group by ItemToVoteOn
And this query (which doesn't reference the view or by extension it's index) results in this execution plan. Notice the index is used. Of course drop the index, (and gain insert performance)
And one last word. Until you're up and running there is really know way that you'll know if any sort of denormalization will in fact help overall throughput. By using indexes you can create them, measure if it helps or hurts, and then keep or drop them as required. It's the only kind of denormalization for performance that is safe to do.
I'd suggest you build a Stored Procedure that does both the vote insert and the update on the total votes. Then your application only has to know how to record a vote, but the logic on exactly what is going on when you call that is still contained in one place (the stored procedure, rather then an ad-hoc update query and a seperate trigger).
It also means that later on if you want to remove the update to total votes, all you have to change is the procedure by commenting out the update part.
The premature premature optimization is saving the total in a table instead of just summing the data as needed. Do you really need to denormalize the data for performance?
If you do not need to denormalize the data then you do not need to write a trigger.
I've done the trigger method for years and was always happier for it. So, as they say, "come on in, the water's fine." However, I usually do it when there are many many tables involved, not just one.
The pros/cons are well known. Materializing the value is a "pay me now" decision, you pay a little more on the insert to get faster reads. This is the way to go if and only if you want a read in 5 milliseconds instead of 500 milliseconds.
PRO: The TotalVotes will always be instantly available with one read.
PRO: You don't have to worry about code path, the code that makes the insert is much simpler. Multiplied over many tables on larger apps this is a big deal for maintainability.
CON: For each INSERT you also pay with an additional UPDATE. It takes a lot more inserts/second than most people think before you ever notice this.
CON: For many tables, manually coding triggers can get tricky. I recommend a code generator, but as I wrote the only one I know about, that would get me into self-promotion territory. If you have only one table, just code it manually.
CON: To ensure complete correctness, it should not be possible to issue an UPDATE from a console or code to modify TotalVotes. This means it is more complicated. The trigger should execute as a special superuser that is not normally used. A second trigger on the parent table fires on UPDATE and prevents changes to TotalVotes unless the user making the update is that special super-user.
Hope this gives you enough to decide.
My first gut instinct would be to write a UDF to perform the SUM operation and make TotalVotes
a computed column based on that UDF.
精彩评论