开发者

How to handle added input data colums without having to maintain multiple versions of SSIS packages?

开发者 https://www.devze.com 2022-12-25 17:31 出处:网络
I’m writing to solicit ideas for a solution to an upcoming problem. The product that provides data to our ETL process currently has multiple versions.Our clients are all using some version of the pr

I’m writing to solicit ideas for a solution to an upcoming problem.

The product that provides data to our ETL process currently has multiple versions. Our clients are all using some version of the product, but not all use the same version and they will not all be upgraded at the same time.

As new versions of the product are rolled out, the most common change is to add new data columns. Columns being dropped or renamed may hap开发者_运维问答pen occasionally, but our main focus right now is how to handle new columns being added.

The problem that we want to address is how to handle the data for clients who use an older version of the product. If we don’t account for the new columns in our SSIS packages, then the data in those columns for clients using an older product version will not be processed.

What we want to avoid is having to maintain a separate version of the SSIS packages for each version of the product. Has anyone successfully implemented a solution for this situation?


Well I had to do something simliar where I got differnt files in differnt formats from differnt vendors that all had to go to the same place. What I did was create a For Each Loop Container that runs though the files and the first step of the loop determines which path it goes down. Then I wrote a separate data flow for each path.

You could do this with a table that lists the expected columns per version and then sends it down the path for the version it matches.

ALternatively, If you know the version each customer has, you could havea table storing that and from the customerid, determine which path.

Or you could write a new package for each version (include the version inthe name) to avoiding having 20 differnt paths in one SSIS package. Then have a for each loop in a calling SSIS package then sends the file to the correct version. Or simply set up differnt jobs for each client knowing what ppackage version they are on. Then you could just change the ssis package their job calls when they upgrade to the new version.


It sounds like you are trying to avoid having to maintain meta data for all the different possible versions. @HLGEM's solution is good, but it still requires you to maintain meta data for all possible combinations of versions.

I had a similar situation where we regularly push out separate client versions, newer versions tend to have extra columns, and we can't force users to upgrade to the latest version. For sources of data where the raw data is from a database table, we always take every column regardless of the user's version. For flat files that we import where the schema is different between versions, we've used three separate solutions:

  1. Conditional Splits: The most obvious solution that works well when there are few variations and a simple way to detect the difference based on a few properties of the row. This doesn't scale well for managing complex changes because the expressions become too difficult to write and maintain.

  2. Script Transformations: If you read in each row as a single string, then you can use script tasks to determine if the additional columns need to be written out. This works well when there are many, many, many different combinations of field combinations and the rules for determining which path to use is highly complex.

  3. Table-driven Metadata: For one corner case where I was importing XML files I built a control table with version numbers. I basically loaded the XML into a XML data type in a table and then processed the XML in a stored procedure. The package then iterated over each version number and dynamically generated the SQL it needed from the table to extract the correct nodes from the XML and then flagged the original row as processed. This was a good solution for my process, but the major challenge with the approach was knowing when to add new rows to the control table. I basically had to give the development group a check box on their SDLC forms that required them to get me to sign-off that I received the new schema changes for major version changes.

I'm not sure if any of these help you, but I hope you can extract something useful from it. Good luck.

0

精彩评论

暂无评论...
验证码 换一张
取 消