I am trying to join 2 files together - both files are in CSV format - both files have the same columns. Here is an example of each file :
File 1:
CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,107309 ,2011-09-24 12:47:11.000,AP/157371,07741992165 ,2.3900,.0000,.0000,.0000,.0000,.0000,.0000,2.3900
acme,107309 ,2011-09-24 12:58:32.000,AP/162874,07740992165 ,2.0000,.0000,.0000,.0000,.0000,.0000,.0000,2.0000
anot,107308 ,2011-09-24 12:58:32.000,AP/162874,07824912428 ,2.0000,.0000,.0000,.0000,.00开发者_Python百科00,.0000,.0000,2.0000
anot,107308 ,2011-09-24 12:47:11.000,AP/157371,07834919928 ,1.5500,.0000,.0000,.0000,.0000,.0000,.0000,1.5500
File 2:
CustName,AccountReference,InvoiceDate,InvoiceRefID,TelNo,Rental,GPRS,Mnet,MnetPlus,SMS,CSD,IntRoaming,NetAmount
acme,100046,2011-10-05 08:29:19,AB/020152,07824352342,12.77,0.00,0.00,0.00,0.00,0.00,0.00,12.77
anbe,100046,2011-10-05 08:29:19,AB/020152,07741992165,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50
acve,100046,2011-10-05 08:29:19,AB/020152,07740992165,10.00,0.00,0.00,0.00,0.00,0.00,0.00,10.00
asce,100046,2011-10-05 08:29:19,AB/020152,07771335702,2.50,0.00,0.00,0.00,0.00,0.00,0.00,2.50
I would like to join the 2 files together - but just taking some of the columns the other columns can be ignored (some are the same, some are different) -
AccountRef,telno, rental_file1,rental_file2,gprs_file1,gprs_file2
etc etc ....
The join should be done on the telno
column (it seems I have white space in file 1 - hope that can be ignored ?
i have found lots of examples using JOIN
but all of them use the first column for the key on the join .... any pointers would be great - thanks
The basic answer is:
join -t , -1 3 -2 4 -1 6 -2 2 file1 file2
This will join the files file1
and file2
on column 3 from file
with column 4 from file2
, then on columns 6 and 2. The data files must be sorted on those same columns, of course. The -t ,
sets the separator for CSV - but join
will not handle embedded commas inside quoted strings.
If your data is simple (no quoted strings) then you can also use awk
. If your data has quoted strings which may contain commas, etc, then you need a CSV-aware tool. I'd probably use Perl with the Text::CSV module (and the Text::CSV_XS module for performance).
awk -F' *, *' 'NR > 1 && NR == FNR {
_[$5] = $0; next
}
NR == 1 {
print "AccountReference", "TelNo", "Rental_" ARGV[2], \
"Rental_" ARGV[3], "GPRS_" ARGV[2], "GPRS_" ARGV[3]
next
}
$5 in _ {
split(_[$5], t)
print $2, $5, $6, t[6], $7, t[7]
}' OFS=, file2 file1
Have a look at cat
and cut
:-)
For instance
cat file1 file2 | cut -d, -f2,5
yields
107309 ,07741992165
107309 ,07740992165
107308 ,07824912428
107308 ,07834919928
100046,07824352342
100046,07741992165
100046,07740992165
100046,07771335702
All the GNU utilities documented here:
http://www.gnu.org/s/coreutils/manual/html_node/index.html#Top
For your problem, see cat, cut, sort, uniq and join.
精彩评论