开发者

Analytic / Reporting - same or separate database, and which DB?

开发者 https://www.devze.com 2023-02-01 17:02 出处:网络
I have a user content website with some business functionality. All tables are in 1 database. Now I am adding analytic with in dept reporting based on the activity and user log tables - breaking it do

I have a user content website with some business functionality. All tables are in 1 database. Now I am adding analytic with in dept reporting based on the activity and user log tables - breaking it down to have in dept reports by each day of the year, by each activity type, etc. The question is: do I create a separate database for analytic (or as people call it the data warhouse) or do I just add those new tables to existing database? If I have to create a separate DB for this, then that means I need to load in all the data from the main DB into temp tables in the Analyic DB then load that data into the analytic tables I assume?

The analytic requirements is as close to real time as possible so based on this I am not sure which DB to use if I do go for a separate one. Can MySQL that I use do the job of providing real time analytic, that is user takes an action a开发者_Go百科nd the next second if he views a report the numbers will be already aggregated?


It depends on the amount of reporting you're expecting. Transaction processing databases are generally designed in 3NF for efficient inserts.

Reporting is more complex due to the number of joins required. Also adding an significant extra number of SELECT transactions from reporting has the potential to degrade performance, hence why reporting databases are used.

It's up to you to weigh the likely reporting load and performance impact against setting up a reporting replica and ETL to populate it. Also you need to determine if you have a replica, how often to replicate. There is an argument you can use against the 'real time' requirement that the business reporting may be more 'consistent' if the business are reporting against a fixed snapshot of data (e.g. a daily copy).

See Strategies for populating a Reporting/Data Warehouse database for approaches to load data into a reporting database.


It's really all about the hardware at this point. If you are going to locate the analytical database on the same system(hard-drive) as the application, you are not going to see a whole lot of performance improvement anyway you cut it. Your speed is slowed by your disk scans...one disk is only going to scan so fast, regardless of database separation.

However, if I had to implement with real-time reporting... I would do a second database for user activity and monitoring. I'd insert into this database as users perform activities. During reporting, I'd join across databases (which will slow the system down, but since you are on the same box, you have no alternative). I would put the user activities on a separate database, so it's easier to backup and clean old data out as time goes by.

If you are able to separate your reporting system from your application system, I would do a 15 ETL/sync job which copies only the tables you need over to the reporting database on a different system. I would then report off that system. Obviously the users have a 15 min delay, but this allows swifter reporting. However, this would not be a true data warehouse, but an ad-hoc solution to meet your specific need.

0

精彩评论

暂无评论...
验证码 换一张
取 消