开发者

Partitioning by Year vs. separate tables named Data_2011, Data_2010, etc

开发者 https://www.devze.com 2023-04-07 22:25 出处:网络
We are designing a high volume SQL Server application that involves processing and reporting on data that is restricted within a specified year.

We are designing a high volume SQL Server application that involves processing and reporting on data that is restricted within a specified year.

Using Partitioning by year comes to mind.

Ano开发者_Go百科ther suggestion is to programmatically create separate physical table where the suffix of the name is the year and, when reporting is needed across years, to provide a view which is the union of the physical tables.

My gut tells me that this situation is what partitioning is design to handle. Are there any advantages to using the other approach?


From an internals perspective, the methods are essentially the same.

Behind the scenes, when you create a date-based partition the SQL engine creates separate physical tables for each partition, then does what is basically a UNION when you query the table itself.

If you use a filter in your query on the partitioned table that corresponds to your partitioning field (DateField let's say), then the engine can go directly to the partition that you need for the data. If not, then it searches each physical table in the logical table as needed to complete the query.

If your queries will involve a date filter (which it sounds like they will from your question) then I can think of no advantage to your "custom" method.

Essentially, the choice you need to make is do you want to be responsible for all the logic and corner cases involved in partitioning, or trust the developers at Microsoft who have been doing this for decades to do it for you?

For my own purposes, if there is a built-in framework for something I want to do then I always try to use it. It is invariably faster, more stable, and less error-prone than a "roll-your-own" solution.


Both of the solutions means that you have to do some metadata operations in the db. The question is whether you will do some changes/updates in the historical data? I was working on similar solution - bud instead of a year we were working a half year of the data. In this case we used partitioning by date - we have half year floating window keeping 2 years of historical data + current half year (HTD) in 10 partitions(each partition represents a separate quarter). We were updating the HTD data every day and once a week we were restating some of the historical data. In this case we were hitting only few partitions (the partition id was defined in where clause, the partitioning key was a date_id representing the calendar date in one of our dimensions). The whole table had about 250M of rows. Every half year the process is adjusting the partitioning, but the same you will have to do with the view. Using this approach we can always execute an update against the whole table (using the view you will have to test the the update scenario or run the update against separate tables). We have procedures in place which can truncate / switch out a specified partition of the table so the manipulation is quick.

It is difficult to say which is the best option. But in general I would suggest to use the tables in the case that you really are not changing the history (I would go for 1 partitioned table for history and 1 table for current data)


I feel using partitioning with a date driven paritioning key is like using a hammer to drive in a screw...'that must have been why they invented the hammer'...Partitioning is good when you need parallel processes to run as in data marts or you partition on some arbitrary key e.g. and identity column. In your case, the business requirement is simply to keep multiple years of history. In order to use partitioning, the app team would need to create a routine that dynamically generates the partitioning constraint, which is DDL and is the responsibility of the DBA team. The multi-table/union view provides a much simpler soluiton.

0

精彩评论

暂无评论...
验证码 换一张
取 消