开发者

CSV Import with Validation

开发者 https://www.devze.com 2023-01-07 07:35 出处:网络
I have a need to import a number of CSV files into corresponding tables in SQL. I am trying to write a stored procedure that will import any of these CSV files, using a number of parameters to set thi

I have a need to import a number of CSV files into corresponding tables in SQL. I am trying to write a stored procedure that will import any of these CSV files, using a number of parameters to set things like file name, destination name etc.

Simple so far. The problem comes because of the structure of this DB. Each data table has a number of (usually 5) columns that are of a set format, and then however many data columns you want. There are then a set of data validation tables that contain specific sets of values that these 5 columns can contain. So the problem is, is that when I do the import from CSV, I need to validate that each row that is imported meets the criteria in these validation tables, essentially that there is a row in the validation table that has data that matches the 5 columns in the imported data.

If it does not, then it needs to write an error to the log and not import it, if it does then it should import it.

Here is an example of w开发者_运维问答hat I mean:

Data Table (where the imported data will go)

|datatype|country|currency| datacolumn1 | datacolumn|
|1       | 2     |  GBP   | 10000       | 400       |
|3       | 4     |  USD   | 10000       | 400       |

Validation table

|datatype|country|currency|
|1       |2      |GBP     |
|2       |3      |USD     |

So the first line is valid, it has a matching record in the validation table for the first 3 columns, but the second is not and should be rejected.

The added problem is that each table can reference a different validation table (although many reference the same one) so the columns that have to be checked often vary in amount and name.

My first problem is really how to do a row by row check when importing from CSV, is there any way to do so without importing into a temporary table first? After that, what is the best way to check that the columns match, in a generic way despite that fact that the name and number of columns change depending on what table is being imported.


You can import the contents of a csv into some temporary tables by using this -

SELECT * into newtable FROM
OPENROWSET ('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DefaultDir={Directory Path of the CSV File};', 
'SELECT * from yourfile.csv');

Once you have your data in some sql table, you can use an inner join to validate the data and narrow down to the valid rows.

SELECT A.*,B.* FROM newtable A
INNER JOIN validation_table B ON A.Datatype = B.Datatype
INNER JOIN validation_table C ON A.Country = C.Country
INNER JOIN validation_table D ON A.Currency = D.Currency

This should give you the valid rows according to your validation rules.


SSIS would let you check, filter, and process data while it was being loaded. I'm not aware of any other native SQL tool that does this. Without SSIS (or a third-party tool), you'd have to first load all the data from a file into some kind of "staging" table (#temp or dedicate permanent) and work from there.

@Pavan Reddy's OPENROWSET solution should work. I've used views, where I first determined the rows in the source file, built a "mapping" view on the target table, and then BULK INSERTED into the view (which also lets you play games with defaults on "skipped columns").

(Just to mention, you can launch an SSIS package from a stored procedure, using xp_cmdshell to call DTEXEC. It's complex and requires a host of parameters, but it can be done.)

0

精彩评论

暂无评论...
验证码 换一张
取 消