I need to scrape an old mainframe text file containing Printer Control Language (PCL) for a data import. Altering the mainframe functions isn't an option. The print out contains product sales information开发者_高级运维 and has a hierarchical output.
My hope is that I setup a Sql Server Integration Service import (SSIS). Ultimately this will be a data import ASP.NET MVC 3 website with a SQL 2005 database, so we could avoid SSIS. I currently build C# ASP.NET MVC 3 websites, so using related technologies should be manageable.
Has anyone succeeded in parsing a text report back in to a useful data import with text patterns (like Regular Expressions) in C# or SSIS? Are there any examples out there using a state design pattern?
I find a lot of these answers showing a small part of the answer: how to load a text file and take the nth column in C#. This is more involved. I need to identify each line type with a pattern based on what import state I am within. Off the shelf software would be even better.
Text file example:
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
this part may be a header for the page which needs skipped
first line containing prices
second line containing product description for the first line
third line containing a related product (listing all flavors)
fourth line containing a description for the third line
[third and forth may repeat]
[product set summary line]
[ repeat for next product]
this part may be a footer for the page that needs skipped
this part may be a footer for the page that needs skipped
at any point, the products will span between pages,
having header and footer lines between product data.
I've done a lot of parsing in C#. However, here, it's not clear to me what kind of text you need to parse (your example doesn't appear to show the actual text). Obviously, you need some way to identify the type of each line.
Here are a couple of articles that may help:
A Text Parsing Helper Class
A sscanf() Replacement for .NET
I've been worked some years with cobol integrations, I had to broken text strings based in a "cobol book" that had fields specifications.
You can use the agpc.fixedlayout to help integration without need to use substrings to get informations about each field
This is the nuget https://www.nuget.org/packages/AGPC.FixedLayout
精彩评论