I want to create a 'google analytics' type application for the web -开发者_如何学Go i.e. a web-based tool to do some reporting and graphing for my database. The problem is that the database is HUGE, so I can't do the queries in real time because they will take too long and the tool will be unresponsive.
How can I use a cron job to help me? What is the best way to be able to make my graphs responsive? I think I will need to denomalize some of my database tables, but how do I make these queries faster? What intermediate values can I store in another database table to make it quicker?
Thanks!
Business Intelligence (BI) is a pretty mature discipline - and you'll find answers to your questions in any book on scaling databases for reporting & data warehousing.
A high-level list of tactics would include:
- partitioning (because indexes are little help for most reporting)
- summary tables (generated usually through a batch process submit via cron)
- you need a good optimizer (some databases like mysql don't - so make poor joining decisions)
- query parallelism (some databases will provide linear speedups just by splitting your query into multiple threads)
- star-schema - a good data model is crucial to good performance
In general dynamic reporting beats the pants off static reporting - so if you're after powerful reporting I'd just try to copy data into an appropriate model, use aggregates, possibly change the database to get a good optimizer and the appropriate features rather than run reports in batch.
精彩评论