I would like your opinions on how to deal with data growth as it pertains to a MSSQL database. To keep things simple, I'll constrain our discussion to one table only. Say that the table grows to contain lots of data, i.e. the time to retrieve data is noticeable to the end user despite the fact that there are indexes.
Now, not all the data needs to be available at all times although should we need to do so. In other words, we can "archive" some of the old data. Our current strategy is simply to move the "archive" data to another table so that "day to day" queries are quick and we can access the "archived" data when we need to by com开发者_JAVA百科bining the two tables.
Use table partitioning.
Read these links the author details it with an example. http://blog.sqlauthority.com/2008/01/24/sql-server-2005-introduction-to-partitioning/
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/
I've seen that a lot, and it can certainly work. If you are using stored procedures, you can then maintain two sets where one is only for day to day, and the second will reference both tables or a view that combines the two sets.
Depending on how long you do that for, you still potentially end up with an archive table that has a lot of data at some point, so you may need to look at a strategy that has an end date before moving data into offline storage or a third table that is not searched as part of the process.
For SQL Server 2005+, take a look at table partitioning.
As a bit of clarification to the other answers, both strategies work well depending on which edition of Sql Server you are using. If you have enterprise edition then table partitioning is the way I would go as it requires less work on the development side. Unfortunately Microsoft has limited table partitioning to enterprise edition only, even though it is a feature in virtually every other database out there including the free open source alternatives. If you are stuck without enterprise edition (as many are considing it's >$20k/cpu pricing) then moving to manual partitioning approach as you first suggested is a good alternative. Another alternative that does not require enterprise edition is to use covering indexes. One good way to do that is to use a view and create a materialized index on the view, it is basically creating copy of only the data you need on disk without the overhead of you moving the data around yourself. Say you created a view with a where clause to only include today's data and then created a materialized index on the view, there would actually be an index containing all of the data from today on disk and the return will be as fast as if you had manually partitioned the data. Unfortunately there are many limits to the complexity of a view that you can create an index on but it's worth a shot, when it works it works well.
精彩评论