DB load CSV into multiple tables_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2022-12-18 10:40 出处：网络

UPDATE: added an example to clarify the format of the data. Considering a CSV with each line formatted like this:

UPDATE: added an example to clarify the format of the data.

Considering a CSV with each line formatted like this:

tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,[tbl2.col1:tbl2.col2]+

where [tbl2.col1:tbl2.col2]+ means that there could be any number of these pairs repeated

ex:

tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2,tbl2.col1:tbl2.col2

The tables would relate to eachother using the line number as a key which would have to be 开发者_如何学运维created in addition to any columns mentioned above.

Is there a way to use mysql load data infile to load the data into two separate tables?
If not, what Unix command line tools would be best suited for this?

no, not directly. load data can only insert into one table or partitioned table.

what you can do is load the data into a staging table, then use insert into to select the individual columns into the 2 final tables. you may also need substring_index if you're using different delimiters for tbl2's values. the line number is handled by an auto incrementing column in the staging table (the easiest way is to make the auto column last in the staging table definition).

the format is not exactly clear, and is best done w/perl/php/python, but if you really want to use shell tools:

cut -d , -f 1-5 file | awk -F, '{print NR "," $0}' > table1

cut -d , -f 6- file | sed 's,\:,\,,g' | \
  awk -F, '{i=1; while (i<=NF) {print NR "," $(i) "," $(i+1); i+=2;}}' > table2

this creates table1 and table 2 files with these contents:

1,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
2,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5
3,tbl1.col1,tbl1.col2,tbl1.col3,tbl1.col4,tbl1.col5

and

1,tbl2.col1,tbl2.col2
1,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
2,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2
3,tbl2.col1,tbl2.col2

As you say, the problematic part is the unknown number of [tbl2.col1:tbl2.col2] pairs declared in each line. I would tempted to solve this through sed: split the one file into two files, one for each table. Then you can use load data infile to load each file into its corresponding table.