开发者

Primary Key / Clustered key for Junction Tables

开发者 https://www.devze.com 2022-12-22 15:10 出处:网络
Let\'s say we have a Product table, and Order table and a (junction table) ProductOrder. ProductOrder开发者_JS百科 will have an ProductID and an OrderID.

Let's say we have a Product table, and Order table and a (junction table) ProductOrder.

ProductOrder开发者_JS百科 will have an ProductID and an OrderID.

In most of our systems these tables also have an autonumber column called ID.

What is the best practice for placing the primary key (and therefor clustered key)?

  • Should I keep the primary key of the ID field and create a non-clustered index for the foreign key pair (ProductID and OrderID)

  • Or should I put the primary key of the foreign key pair (ProductID and OrderID) and put a non-clustered index on the ID column (if even necessary)

  • Or ... (smart remark by one of you :))


I know these words might make you cringe, but "it depends."

It is most likely that you want the order to be based on the ProductID and/or OrderId and not the autonumber (surrogate) column since the autonumber has no natural meaning in your database. You probably want to order the join table by the same field as the parent table.

  1. First understand why and how you are using the surrogate key ID in the first place; that will often dictate how you index it. I assume you are using the surrogate key because you are using some framework that works well with single column keys. If there is no specific design reason, then for a join table, I'd simplify the problem and just remove the autonumber ID, if it brings no other benefit. The primary key becomes the (ProductID, OrderID). If not, you need to at least make sure your index on the (ProductID, OrderID) tuple is unique to preserve data integrity.

  2. Clustered indexes are good for sequential scans/joins when the query needs the results in the same order that the index is ordered. So, look at your access patterns, figure out by which key(s) you will be doing sequential, multi-row selects / scans, and by which key you'll be doing random, individual row access, and create the clustered index on the key you'll scan most, and the non-clustered key index on the key you'll use for random access. You have to choose one or the other, since you cannot cluster both.

NOTE: If you have conflicting requirements, there is a technique ("trick") that may help. If all of the columns in a query are found in an index, then that index is a candidate table for the database engine to use to satisfy the requirements of the query. You can use this fact to store data in more than one order even if they are in conflict of one another. Just be aware of the pros and cons of adding more fields to an index, and make a conscious decision after understanding nature and frequency of queries that will be processed.


The correct and only answer is:

  • Primary key is ('orderid' , 'productid')
  • Another index on ('productid' , 'orderid')
  • Either can be clustered, but PK is by default

Because:

  • You don't need an index on orderid or productid alone: the optimiser will use one of the indexes
  • You'll most likely use the table "both" ways
  • You don't need a surrogate key because you already have them on the linked tables. So a 3rd columns wastes space.


This appears to be for a dynamic system where many orders will be added. The clustered index should therefore be on your autonumbered column.

You can make index the primary key and put another unique index on the pair of columns. Or, you can make the pair of columns the primary (but non-clustered) key.

The choice of using the primary key or a unique index key is up to you. But I would make sure that the one that is clustered is for your autonumber column.


My preference has always been to create an autonumber for Primary Keys. Then I create a unique index on the two Foreign keys so that they are not duplicated.

The reason I do this is because the more I normalize my data, the more keys I have to use in joins. I have ended up with designs going six to seven levels deep, and if I use keys flowing from one level to another, I could potentially end up with a n^2 keys in the join.

Try convincing my SQL Developers to use all of that for a single query, and they will really like me.

I keep it simple.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号