I have read that a staging table should be an exact column by column mat开发者_如何学Cch of its target table in the dw. If that is the case, after populating the staging table is it best practice to not do subsequent lookups to match up keys to those in dimension tables?
My question I guess is this, should dimension table key lookups be processed before data goes into a staging table?
It is best practice to populate staging data untouched, but that doesn't mean you cannot add additional metadata columns. As long as the staging data is fully traceable back to source untransformed, you can add any surrogate keys or other ETL-specific data, such as extract time as an example, if you wish.
The normal practice in data warehousing is to populate staging data as-is (extract), then lookup the dimensional keys when upserting to a SCD. The staging data shouldn't really contain warehouse-specific information as it could be used for many purposes.
I'd be interested to know where you read that. There is no hard and fast rule, but most staging tables are a raw dump of the required source data, sometimes with some basic cleansing performed. I'd steer well clear of doing any lookups to your data warehouse at this point. The lookups should happen in the process when you move the data from Staging to your warehouse.
精彩评论