开发者

Datawarehouse Fact table question

开发者 https://www.devze.com 2023-01-16 10:04 出处:网络
I have a fact table called Loans.In this table in a loan identifier, the date the loan was made, and the amount of the loan.

I have a fact table called Loans. In this table in a loan identifier, the date the loan was made, and the amount of the loan.

The business requirement I don't quite know how to do in the datawarehouse is the followi开发者_如何学JAVAng. The loan amount can be adjusted in the future. So lets say August 1st we make a loan with an id of 1 and a amount of 20,000. October 1st this loan is adjusted to 19,000. Do I put two entries in the fact table with the same identifier, and different dates and amounts?

Do I create a new table (dimension table) with loan amount and date in it? What is the best way to do this scenario?

In the production database we have a table for total loan amount, and then a table off of that for loanAmounts so a combination of loanAmounts equals totalLoanAmount.


I would always recommend treating such fact tables as fully-summable, having what are effectively movement transactions for any uplift, payment, interest etc - if you do this you can have absolute flexibility to report over time, filter by transaction type etc - storing multiple version rows in the same fact table or updating a snapshot table maybe other options for reporting when you are amalgamating different metrics across dimensions but at the base level it is usual to have these line by line events itemised.

M


Treat the LoanID as a degenerate dimension and enter the correction separately. You may also use a transaction type dimension to describe: loans, payments, corrections etc.

DateKey CustomerKey TransacionTypeKey LoanID    Amount
---------------------------------------------------------
20100801    175          1               1     20000.00
20101001    175          2               1    - 1000.00

Show all loans by customer, by loan

select
      CustomerKey
    , LoanID
    , sum(Amount) as Amt
from factLoan
group by CustomerKey, LoanID
order by CustomerKey, LoanID ;


this is called slowly changing dimensions, and it depends on the business requirements, weather or not you want to keep trace of the changes that occurred in that row or not. There is typically 3 types of slowly changing dimensions.

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The new record gets its own primary key.

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

you can refer to kimball manual for more information about slowly changing dimensions.


Whether you actually need multiple versions depends on requirements. Do you only need to report on the loan amounts as-is or do you need to know the as-was position as well? If you aren't sure then it probably makes sense to keep the history (multiple versions). My assumption would be to keep the history.

I would suggest creating new rows for each new version and date column(s) to represent the date(s) the version applies.


As you say, Loan is a fact. Loans can be adjusted, meaning that the loan amount can be increased or decreased. There are two general ways to deal with this:

  1. Insert a new row with the delta to the fact. In this there are now two rows for the loan. This means that the primary key for the Loan fact cannot be Loan ID alone. It has (logically, not necessarily physically) to be Load ID and Date, or Loan ID and Entry Number (with date as another attribute). The result would be as you stated. I would change TransactionTypeKey to TransactionTypeCode, which is more correct naming.

  2. Update the fact with the new balance. In this case the final result is stored but the changes are lost. The primary key is LoanID; date is an attribute.

Most other applications, such as order processing, will use the first solution, which is what I think you need. Solution 1 has the advantage also that it is fully additive.

As to the comment from Saad El Oualji, this is not a case of SCD at all but of a detailed fact design. His description of SCD is correct but SCD describes dimension changes, not fact changes.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号