Data historic in a business application_问答_开发者

I have work with a few databases up to now and the philosophys where verry different. It got me wondering,

Is it a good idea to duplicate tables for historic purpose in a business application?

By buisiness application i mean : a software used by an enterprise to manage all of his data (eg. invoices, clients, stocks [if applicable], etc)

By 'duplicating tables' i mean : when, lets say your invoices, goes out of date (like after one year, after being invoiced and paid, w/e), you can store them into 'historic' tables which makes them aviable for consultation but shouldent be modified. Same thing clients inactive for years.

Pros :

Using historic tables can accelerate researches trough actually used data since it make your actually used tables smaller.

Better separation of historic and actual d开发者_Python百科ata

Easier to remove data from the database to store it on hard media without affecting your database, (more predictable beacause the data had no chance of being used since it was in an historic table). This often happend after 10 years when you got unused data.

Cons :

Make your database have up to 2 times more tables.

Make your database more complex

Make your program more complex for reports since you sometimes have to import twice the amount of tables.

Archiving is a key aspect of enterprise applications, but in general, I'd recommend against it unless you really, really need it.

Archiving means you either accept you can't get at historical data before a specific date, or that you create some scheme for managing "current" and "historical" data; your solution (archive tables) is one solution to this problem.

Neither solution is all that nice - archive tables mean lots of duplicated code/data, complex archival procedures (esp. with foreign key relationships), lots of opportunity for errors.

I do believe the concept of "time" should be baked into the domain and data model for most business applications, along with mutability - you shouldn't be able to change an order once it's been confirmed, but you should be able to add products to a new order.

As for your pros:

In general, I don't think you'd notice the performance impact unless you're talking about very, very large scale businesses. I don't think - on modern SQL server solutions - you'd notice the speed difference between querying 10.000 customer records or 1.000.000 customer records.

The definition of "historic" is actually rather tricky - most businesses have to keep historical around for regulatory and tax purposes, often for many years; they'll probably want to be able to analyse trends over several years, etc. If the business wants to see "how many widgets did we sell per month over the last 5 years", that means you have to keep 5 years of data around somehow (either "raw" or pre-aggregated).

Yes, separating out data would be easier. Building a feature today - which you have to maintain every time you change the application - for pay-off in 10 years seems a poor investment to me...

I would only have a "duplicate" type table to store historic VERSIONS of each record, like a change log. Even a change log is not a duplicate as it would have to have info on when it was changed, etc. As a general practice,I would not recommend migrating rows from an active to a historical table. You'd have to manage different versions of queries to find the data in two places! Use a status to control if the data can be changed. I could see it may be done if there are certain circumstances for a particular application. Once you start adding foreign keys, it becomes difficult to remove data. If you had a truly enterprise business application and you attempted to remove invoices, you have all sorts of issues with FKs to other tables, accounts payable/receivable, costs of raw materials, profits from sales, shipping info, etc.