There is 'team_sector' table with following fields: Id, team_id, sect_id, size, level
It contains few records for each 'team' entity (referenced with 'team_id' field). Each record represent sector of team's stadium (totally 8 sectors).
Now it is necessary to implement few searches:
- by overall stadium size (SUM(size));
- the best quality (SUM(level)/COUNT(*)).
I could create query something like this:
SELECT TS.team_id, SUM(TS.size) as OverallSize, SUM(TS.Level)/COUNT(TS.Id) AS QualityLevel
FROM team_sector
GROUP BY team_id
ORDER BY OverallSize DESC / ORDER BY QualityLevel DESC
But my concern here is that calculation for each team will be done each time on query performed. It is not too big overhead (at least now), but I would like to avoid performance issues later.
I see 2 options here.
The 1st one is to create 2 additional fields in 'team' table (for example) and store there OverallSize and QualityLevel fields. If information if 'sector' table is changed - update those table too (probably would be good to do that with triggers, as sector table doesn't change too often).
The 2nd option is to create a view that will provide required data.
The 2nd option seems much easier for me, but I don't have a lot of experience/knowledge of work with views.
Q1: What is the best option from your perspective here and why? Probably you could suggest other options?
Q2: Can I create view in such way that it will do calculations rarely (at least once per day)? If yes - how?
Q3: Is it reasonable to use triggers for such purpose (1st option).
P.S. MySql 5.1 is used开发者_如何学编程, overall number of teams is around 1-2 thousand, overall number of records in sector table - overall 6-8 thousand. I understand, those numbers are pretty small, but I would like to implement the best practice here.
I wouldn't add calculated fields to your source tables. Keep your source data separate from calculated data by using temporary tables instead. You can use a one-to-one mapping identified by shared PK to increase performance by reducing indexes and such (so the PK of the source rows equals the PK of the rows in the calculated table).
The upside is when you rebuild the DB, it's clear that the calculated data is stale by the absence of the tables. It also allows shortcuts such as clearing all the calculated data by simply dropping the temp tables, for instance by a cron job. In that manner, the calculated data rows might also keep a timestamp of when the data was calculated. In that manner, if the max cache period was expired, the calculated data could be recalculated on the fly, as it is loaded, or as a batch at night, when the servers are quiet.
A few (ten)thousand records are nothing you should be worried about.
Best practices are
- store data in a normalized fashion and let the database engine handle calculations
- index your data properly, do an index maintenance now and then
- avoid storing aggregated values with "parent" records
- do some result caching in the application layer to avoid hitting the DB server more often than necessary
- deal with performance issues when you get them
Yes, the database will calculate the SUM()
whenever the view/query is executed, but I would expect results to be pretty instant for the scenario you describe.
If you encounter a really complicated view that takes a long time to calculate and and you cannot find any way to optimize your tables any further, you can introduce a helper table that is filled with the view results regularly (or via triggers) and and query that table instead of the slow view.
IMHO, anticipating possible performance bottlenecks and "closing" them before they actually show up is wasting your time.
精彩评论