I have two tables: tblIssue and tblIssueSubscriber for my newsletter application.
This is my normalized design:
tblIssues (newsletter issues masterlist)
--------------------
IssueId int PK
PublisherCode varchar(10)
IssueDesc varchar(50)
tblIssueSubscribers (newsletter subscribers)
-----------------
IssueId int FK
EmailAddress varchar(100)
but tblIssueSubscriber is expected to hold hundred thousands or even millions of record per week and it will be accessed frequently that's why Im leaning towards Table partitioning. My design is to partition the tblIssueSubscriber per PublisherCode (We have 8 publisherCode on our masterlist).
tblIssues
--------------
IssueId int PK
PublisherCode varchar(10)
IssueDesc varchar(50)
tblIssueSubscribers
-----------------
IssueId int FK
PublisherCode varchar(10)
EmailAddress varchar(100)
and then partiti开发者_如何学Gooned it per PublisherCode
CREATE PARTITION FUNCTION [PartitionPublisher] (varchar(10)) AS RANGE RIGHT FOR VALUES ('PUBLISHER1', 'PUBLISHER2', 'PUBLISHER3', 'PUBLISHER4', 'PUBLISHER5', 'PUBLISHER6', 'PUBLISHER7', 'PUBLISHER8');
I know that table partitioning adds complexity so my question is,
Is it worth partitioning tblIssueSubscriber, or should I stick to the normalized design?
First I think Size is a Red Herring. It's not a very useful argument since all size is relative and there are reasons to use partition irrespective of size.
Performance is only part of the reason. Ronnis makes some good points but it doesn't stop there.
There are two reasons to partition a table. One is performance, one is maintenance.
Let's start with maintenance.
In general DELETE is a 'bad' thing to do in a database. Say you mistakenly insert 1 million rows and then delete 1 million rows. Each of those deletes is logged generating UNDO and REDO records, which waste space and take time not only to make while deleting but again when 'played' for a point-in-time recovery. So what's better than delete? Truncate (or drop). When you have tables as you describe that are constantly growing, at some point you'd like to get rid of old records. This is why I say size is irrelevant - if you want to keep a year in that table, you'll need to remove records that are more than 12 months old - NO MATTER WHAT THAT SIZE IS. You could have a 300MB table or a 500GB table after 1 year of adding records - regardless you'll need/want to start deleting. So you can always just delete the rows with insert_dt < sysdate - 365. Or you could just drop or truncate that month/day's partition. A not logged transaction that will be less resource intensive.
There are other maintenance benefits like individually backing up partition or rebuilding indexes or moving to new tablespaces etc. Not sure what RDBMS you're using but you can load data via partition swaps in most. This allows you to make no changes to your final tables until all of the data is loaded and ready to go.
As far as performance goes...
The key here is that any query that doesn't include the partition key in the where clause will most likely perform worse than it did before partitioning. This isn't a GO_FASTER = TRUE type of setting. I've seen people implement partitioning and crush their systems. Ronnis' post is the basics of performance guideline in a single partitioned tables. If you have more than one table partitioned on the same key, some RDBMS's can parallelize the joins between them.
The query patterns will determine whether you will benefit from partitioning.
If your application is mostly about single row queries (typically primary key or indexed access), you will not see a performance gain from partitioning the table.
If your application is mostly about processing all the data publisher-wise, then you would benefit from partitioning by eliminating larger parts of the table when performing table scans.
It really depends on how large that database file is going to become and how many records you are going to have in there and what machine you are using. Do a rough calculation of how large you think it will become.
Roughly, lets say that database file will grow to 300 MB?
That is nothing... I would personally not partition it. I know some of our database clients who use partitioning, and they started partitioning when they expected the database to grow beyond 500 GB and that it ultimately may reach 4 TB. In that case, yes partition. But I suspect you are not going to go anywhere near that.
Plus, you can always partition later, no?
I would recommend a 64-bit machine, running Linux or Windows server 2008/Win7. And more memory is always good.
精彩评论