I want to populate a star schema / cube in SSIS / SSAS.
I prepared all my dim开发者_开发问答ension tables and my fact table, primary keys etc.
The source is a 'flat' (item level) table and my problem is now how to split it up and get it from one into the respective tables.
I did a fair bit of googling but couldn't find a satisfying solution to the problem. One would imagine that this is a rather common problem/situation in BI development?!
Thanks, alexl
For a start, it depends on whether you want to do a simple initial data transfer or something more sophisticated (e.g. incremental). I'm going to assume you're doing an initial data transfer.
Say your item table has columns as follows: id, cat1, cat2, cat3, cat4, ...
Assuming categories 1-4 have columns id, cat_name
, you can load dim_cat1 (the dimension table of item category 1) as follows:
insert into dim_cat1 (cat_name)
select distinct cat1 from item_table;
You can do the same for all of the other categories/dimension tables. I'm assuming your dimension tables have automatically generated IDs. Now, to load the fact table:
insert into fact_table (id, cat1_id, cat2_id, cat3_id, cat4_id, ...)
select id, dc1.id
from item_table it
join dim_cat1 dc1 on dc1.cat_name = it.cat1
join dim_cat2 dc2 on dc2.cat_name = it.cat2
join dim_cat3 dc3 on dc3.cat_name = it.cat3
join dim_cat4 dc3 on dc4.cat_name = it.cat4
...
If you have a substantial amount of data, it might make sense to create indexes on the category names in the item_table and maybe the dimension tables.
Btw, this is a database-independent answer, I don't work with SSIS/SSAS: you might have tools available which streamline parts of this process for you, but it's really not that difficult/time consuming to write in plain SQL.
We do this by using a dataflow task to copy information since the last package execution time into a temp staging tables, then update the archive/warehouse with data from those staging tables based on a key, then insert those rows which don't exist yet. Truncate the staging table ready for next time, add a load of auditing. Job Done?
I frequently build cubes against Operational Data Stores instead of star schemas.. Performance will almost always be better with a Star Schema, but for prototyping / testing, don't be afraid to develop cubes against the data you HAVE not the star schema you WANT.
精彩评论