I am looking at creating a common import routine for various parts of my company's main system to be used when implementing a new client. For example, we may get excel or csv files of inventory, customers, etc. that needs to be imported into a common model.
I was wondering if anyone had some good ideas or best practices for doing such a thing (in terms of technology and/or process). We are a MS SQL2005 and .Net based shop.
I was thinking of something like UPS's worldship where a program interprets your import file and you match the columns you have to available columns in the UPS system but there may be much better ways... thats just an interface I am used to.
Secondarily, I want to build it in such a way that other developers can plug in their own data manipulation routines into the process as well (ie, if import value is Y, change to 1). So any ideas on how to accomplish that as well are greatly appreciated!
I know the information is开发者_运维百科 not enough to give a comprehensive solution. I was just hoping to get some good ideas and maybe a different perspective on how to attack it best ;)
Thanks in advance!
We use SSIS and create parent and child packages. In the child packages are the standard fields and transformations and imports to the production tables. In the parent packages are any nonstandard transforms (required because of data issues with that particular client) and nonstandard import tasks (maybe they provide specialized data that doesn't normally need to be imported. The parent package takes the client data in the format that the client is able to give it to us (which is all too often not the format we would like to get) and transforms it to our standard format and then calls the child package to do all the standard things. We configure the child packages through variables that are sent from the parent package (things like the client Id that would change for different clients).
One thing to be wary of is developing the child package using a smaller than normal data set. For development purposes, use a file that is the largest size you expect to get from your largest client. You would hate to spend time creating a child package that only works if the file is small and takes 24 hours when the file is large. Best to know what the performance on large files will be in advance.
精彩评论