开发者

SSIS - Importing variable format text files

开发者 https://www.devze.com 2022-12-08 12:21 出处:网络
By \"variable format\" I mean that each row in the text file begins with a single character code denoting the record type, and each record type has a different form开发者_运维百科at. I believe this is

By "variable format" I mean that each row in the text file begins with a single character code denoting the record type, and each record type has a different form开发者_运维百科at. I believe this is some sort of fairly common mainframe data export method. What makes matters worse is the fact that the record types are actually hierarchies. The codes in use are related as follows:

0ThisIsAFileHeader
2ThisIsABatchHeader
4ThisIsDetailData
4ThisIsDetailData
6ThisIsAMatchingBatchFooter
8ThisIsAMatchingFileFooter

0/8 are header/trailer records (each of their own format), 2/6 are sub-header/sub-trailer records (also each of their own formats) and 4 indicates the actual data or detail records.

SQL Server 2008, Visual Studio.NET 2008. Is a custom script task the only way to import this data? Given that, could someone point me to a resource that discusses doing so? Importing a fixed-width text file is pretty straightforward, but I'm not sure where the script task would fall in the control flow and how the data would be utilized by subsequent steps.


This could be done within SSIS although you will have to jump through a few hoops. Off the top of my head this is one approach:

  • Source File should be set as 2 columns, 1st character and a string of the rest.
  • Conditional Split action based on the first character splitting the rows into separate workflows.
  • Data Conversion action to take the 2nd column and split it into the appropriate fields, this would have to be done for each type of line separately and attached to the appropriate conditional split line.
  • Destination should be configured for each conditional branch to store / handle the data appropriately.

If further ETL work is required to start linking items together, I would import the files into temp tables or a permanent staging area and perform more transformations before committing the final results.

The real hassle is that the data is related and how you keep this relation / infer it afterwards. I think if I had to throw it together I would use a script component in the data flow that increased a counter every time it saw a header record, and was output into the stream. That way each record would have an identifying number which could be used to relate them back afterwards.

It's a bit complex but the problem can be worked around.


Maybe SSIS has a better way to do this, but when I had to do this task years ago in DTS, this is what I did.

First I brought the data into staging table that had two columns (Or three if I felt I needed an autogenerated id). One column was for the characters that indicated the type of record and onecolumn had all the rest of the data for the line.

Then I separated out to normalized tables based on the data doing any clean up along the way.

Then I imported to my production tables.


If I were you and those 6 differnt character codes were the only ones associated with a specific set of data then I would import it using the Flat File Import component and then switch using the Case component on each one of the codes. I am assuming you'd like to maintain the hierarchy so you could split up the data with the Case and insert it however you want afterwards.

Using a Script component to import data is usually my last resort but I do really like to format the data before the actual Flat File Import using script code when the file (like in your case) doesn't import correctly. I believe writing one formatting application and using that to format any file that comes from a system is more useful than writing a custom script import every time.


I've done things like this many times, but I always pre-processed the data to add a line number to each line. After that it was easy enough to join the table to itself using max/min and comparisons on the line number to keep the sections together.

But that's kind of clumsy. If we use a conditional split, can SSIS itself get a line number? Or can we use an incrementing integer key in place of line number, without risk that a conditional would get them out of order?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号