I have been trying to do web scraping of a particular site and storing the results in a data开发者_Go百科base. My original assumptions about the data allowed a schema where I could use fairly reasonable composite primary keys (usually containing only 2 or 3 fields) but as time went on, I realized that my original assumptions about the data were wrong and my primary keys were not as unique as I thought they were, so I have slowly been expanding them to contain more and more fields. In fact, I have recently come to believe that their database has no constraints whatsoever.
Just today, I have finally expanded my a primary key for one of my tables to contain every field in that table and I thought now would be a good time to ask: is it better to add an auto-incrementing column that is just a unique id or just leave a composite primary key on the entire table?
You're better off with one primary key than using all fields as a primary key.
First, your tools will have an easier time recognizing it. I'm sure there are a half a dozen or so other reasons, but this seems like a no-brainer to me.
Surrogate keys all the way - they're just easier to work with.
Then again, I have been playing a lot with Entity Framework and my view could be clouded by that.
@Jack - if you never know or find yourself adding too many composites to make a primary key only to find out that every column makes the actual row unique then you don't know enough about how the database is created. I would agree with you that just to add an incrementing auto pk to be the solution.
The reason to have a large composite key would be to ensure uniqueness. Using a single surrogate key is provides more convenience when you need a foreign key, but you could inadvertently wind up with the same data in different rows with different serial numbers.
You can get the benefit of both (without hashing all the attributes yourself) by using a serial auto-incrementing surrogate key, and separately imposing a uniqueness constraint on the attributes that would have been included in the big composite key.
CREATE TABLE example (
surrogate_key SERIAL PRIMARY KEY,
this VARCHAR(5) NOT NULL,
should INT NOT NULL,
all BOOLEAN NOT NULL,
be VARCHAR(2) NOT NULL,
different VARCHAR(3) NOT NULL,
UNIQUE (this, should, all, be, different)
);
The only time I ever use a composite key is when it consists of two integer fields in a linking table for a many to many relationship. Use a surrogate key and then put a unique index on the fields you would have put into the composite key. This way you save space to child tables, have the improved speed of an integer join (I would not use a GUID unless I was actually going to use replication) and you have the uniqueness of the natural key preserved.
One way to get both the uniqueness of a large composite key and the convenience of a synthetic key is to use a secure hash of the values of all the fields. Personally I would SHA1 the contents of all the fields and then BASE64 or HEX encode that and use it as my key. You get the benefits of having a single column to deal with as well as the ability to tell if the data is already in the database by hashing all the fields and just doing a simple SELECT on the Primary Key to see if it already exists.
精彩评论