开发者

BASH - Join on non-first column

开发者 https://www.devze.com 2023-04-11 03:26 出处:网络
I am trying to join 2 files together - both files are in CSV format - both files have the samecolumns. Here is an example of each file :

I am trying to join 2 files together - both files are in CSV format - both files have the same columns. Here is an example of each file :

File 1:

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,107309   ,2011-09-24 12:47:11.000,AP/157371,07741992165     ,2.3900,.0000,.0000,.0000,.0000,.0000,.0000,2.3900
acme,107309   ,2011-09-24 12:58:32.000,AP/162874,07740992165     ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:58:32.000,AP/162874,07824912428     ,2.0000,.0000,.0000,.0000,.00开发者_Python百科00,.0000,.0000,2.0000
anot,107308   ,2011-09-24 12:47:11.000,AP/157371,07834919928     ,1.5500,.0000,.0000,.0000,.0000,.0000,.0000,1.5500

File 2:

CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,100046,2011-10-05 08:29:19,AB/020152,07824352342,12.77,0.00,0.00,0.00,0.00,0.00,0.00,12.77
anbe,100046,2011-10-05 08:29:19,AB/020152,07741992165,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50
acve,100046,2011-10-05 08:29:19,AB/020152,07740992165,10.00,0.00,0.00,0.00,0.00,0.00,0.00,10.00
asce,100046,2011-10-05 08:29:19,AB/020152,07771335702,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50

I would like to join the 2 files together - but just taking some of the columns the other columns can be ignored (some are the same, some are different) -

AccountRef,telno, rental_file1,rental_file2,gprs_file1,gprs_file2 etc etc ....

The join should be done on the telno column (it seems I have white space in file 1 - hope that can be ignored ?

i have found lots of examples using JOIN but all of them use the first column for the key on the join .... any pointers would be great - thanks


The basic answer is:

join -t , -1 3 -2 4 -1 6 -2 2 file1 file2

This will join the files file1 and file2 on column 3 from file with column 4 from file2, then on columns 6 and 2. The data files must be sorted on those same columns, of course. The -t , sets the separator for CSV - but join will not handle embedded commas inside quoted strings.

If your data is simple (no quoted strings) then you can also use awk. If your data has quoted strings which may contain commas, etc, then you need a CSV-aware tool. I'd probably use Perl with the Text::CSV module (and the Text::CSV_XS module for performance).


awk -F' *, *' 'NR > 1 && NR == FNR {
  _[$5] = $0; next
  }  
NR == 1 {
  print "AccountReference", "TelNo", "Rental_" ARGV[2], \
  "Rental_" ARGV[3], "GPRS_" ARGV[2], "GPRS_" ARGV[3]
  next
  }
$5 in _ {
  split(_[$5], t)
  print $2, $5, $6, t[6], $7, t[7]
  }' OFS=, file2 file1  


Have a look at cat and cut :-)

For instance

cat file1 file2 | cut -d, -f2,5

yields

107309   ,07741992165     
107309   ,07740992165     
107308   ,07824912428     
107308   ,07834919928
100046,07824352342
100046,07741992165
100046,07740992165
100046,07771335702


All the GNU utilities documented here:

http://www.gnu.org/s/coreutils/manual/html_node/index.html#Top

For your problem, see cat, cut, sort, uniq and join.

0

精彩评论

暂无评论...
验证码 换一张
取 消