I'd like to merge two files according to the content of their 2nd columns.
File 1:
"4742" "209220_at" 2.60700394801826
"104" "209396_s_at" 2.60651442103297
"749" "202409_at" 2.59424724783704
"4168" "209875_s_at" 2.58773204877464
"3973" "1431_at" 2.52832098784342
"1826" "207201_s_at" 2.41685345240968
File2:
"653" "1431_at" 2.14595534191867
"1109" "207201_s_at" 2.13777517447307
"353" "212531_at" 2.12706340284672
"381" "206535_at" 2.11456707231618
"1846" "204534_at" 2.10919474441178
To have in the end:
"3973" "1431_at" 2.52832098784342 "653" "1431_at" 开发者_Go百科 2.14595534191867
"1826" "207201_s_at" 2.41685345240968 "1109" "207201_s_at" 2.13777517447307
I have tried comm
, diff
, some obscure awk
one-liner without any success.
Any help much appreciated.
Ben
You can do that with a combination of the sort
and join
commands. The straightforward approach is
join -j2 <(sort -k2 file1) <(sort -k2 file2)
but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file
"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307
If you need the format exactly as you showed, then you would need to tell join
to output in that manner
join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)
where -o
accepts a list of FILENUM.FIELDNUM
specifiers.
Note that the <()
syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.
awk '
# store the first file, indexed by col2
NR==FNR {f1[$2] = $0; next}
# output only if file1 contains file2's col2
($2 in f1) {print f1[$2], $0}
' file1 file2
If the files are small, write a program in a scripting language (Perl, Python and Ruby are all good choices) that reads the first into a hash whose keys are the second column, then read through the second file and use hash lookups to fin out what (if anything) can be joined.
If the files are large then for each file swap the first and second columns, pass them through the unix sort utility, and then in a scripting language merge (plus column reorder) the two sorted files.
awk 'FNR==NR{a[$2]=$0} NR>FNR && ($2 in a){ print $0,a[$2] } ' file2 file1
精彩评论