开发者

Handling primary key duplicates in a data warehouse load

开发者 https://www.devze.com 2022-12-28 14:02 出处:网络
I\'m currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don\'t load duplicate rows I\'ve p

I'm currently building an ETL system to load a data warehouse from a transactional system. The grain of my fact table is the transaction level. In order to ensure I don't load duplicate rows I've put a primary key on the fact table, which is the transaction ID.

I've encountered a problem with transactions being reversed - In the transact开发者_Go百科ional database this is done via a status, which I pick up and I can work out if the transaction is being done, or rolled back so I can load a reversal row in the warehouse. However, the reversal row will have the same transaction ID and so I get a primary key violation.

I've solved this for now by negating the primary key, so transaction ID 1 would be a payment, and transaction ID -1 (In the warehouse only) would be the reversal.

I have considered an alternative of generating a BIT column, where 0 is normal and 1 is reversal, then making the PK the transaction ID and the BIT column.

My question is, is this a good practice, and has anyone else encountered anything like this? For reference, this is a payment processing system, so values will not be modified, so there will only ever be transactions and reversals.


In most cases, a fact table has a primary key which is a composite of several FKs. So, maybe you could use a combination of the TransactionID and a FK to dimTransactionType as the primary key.

The dimTransactionType would look something like:

TransactionTypeKey  integer
TransactionTypeName  varchar(20)

and would have

0, 'unknown'
1, 'normal'
2, 'reversal'

Tinkering with bits and flags is not recommended in a DW -- verbose as much as possible.


A common approach to designing fact tables is to use surrogate keys as the primary keys. A large integer value is usually sufficient. If transaction id is a foreign key to a dimension record, then it should not be used as a primary key in the fact table. Your surrogate key logic (i.e. trigger for a new record in the fact table) can be based on a combination of the transaction id and the transaction type.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号