开发者

Data Warehousing Postgres

开发者 https://www.devze.com 2022-12-08 00:38 出处:网络
We\'re considering using SSIS to maintain a PostgreSql data warehouse. I\'ve used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postg

We're considering using SSIS to maintain a PostgreSql data warehouse. I've used it before between SQL Servers with no problems, but am having a lot of difficulty getting it to play nicely with Postgres. I’m using the evaluation version of the OLEDB PGNP data provider (http://www.postgresql.org/about/news.1004).

I wanted to start with something simple like UPSERT on the fact table (10k-15k rows are updated/inserted daily), but this is proving very difficult (not to mention I’ll want to use surrogate keys in the future).

I’ve attempted (Link) and (http://consultingblogs.emc.com/jamiethomson/archive/2006/09/12/SSIS_3A00_-Checking-if-a-row-exists-and-if-it-does_2C00_-has-it-changed.aspx) which are effectively the same (except I don’t really understand the union all at the end when I’m trying to upsert) But I run into the same problem with parameters when doing the update using a OLEDb command – which I tried to overcome using (http://technet.microsoft.com/en-us/library/ms141773.aspx) but that just doesn’t seem to work, I get a validation error –

The external columns for complent.... are out of sync with the datasource columns... external column “Param_2” needs to be removed from the external columns. (this error is repeated for the first two parameters as well – never came across this using the sql connection as it supports named parameters)

Has anyone come across this?

AND:

The fact that this simple task is apparently so difficult to do in SSIS suggests I’m using the wrong tool for the job - is there a better (and still flexible) way of doing this? Or would another ETL package be better for use between two Postgres database? -Other options include any listed on (http://en.wikipedia.org/wiki/Extract,_transform,_load#Open-source_ETL_frameworks). I could just go and write a load of SQL to do this for me, but I wanted a neat and easily maintai开发者_如何学Pythonnable solution.


I have used the Slowly Changing Dimension wizard for this with good success. It may give you what you are looking for especially with the Wizard

http://msdn.microsoft.com/en-us/library/ms141715.aspx

The External Columns Out Of Sync: SSIS is Case Sensitive - I encountered this issue multiple times and it makes me want to pull my hair out.

This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.


SCD is way too slow for what I want. I need to use set based sql.

It turned out that a lot of my problems were with bugs in the provider. I opened a forum topic (http://www.pgoledb.com/forum/viewtopic.php?f=4&t=49) and had a useful discussion with the moderator/support/developer person.

Also Postgres doesn't let you do cross db querys, so I solved the problem this way:

  • Data Source from Production DB to a temp Archive DB table
  • Run set based query between temp table and archive table
  • Truncate temp table

Note that the temp table is not atchally a temp table, but a copy of the archive table schema to temporarily stored data in.

Took a while, but I got there in the end.

This simple task is going to take some work either way. SSIS is by no means an enterprise class ETL product yet, but it does give you some quick and easy functionality, and is sufficient for most ETL work. I guess it is also about your level of comfort with it as well.

What enterprise ETL solution would you suggest?

0

精彩评论

暂无评论...
验证码 换一张
取 消