unix: merge 2 files using 2nd columns_问答_开发者

开发者 https://www.devze.com 2023-02-10 12:39 出处：网络

I\'d like to merge two files according to the content of their 2nd columns. File 1: \"4742\"\"209220_at\"2.60700394801826

相关专题：merge

I'd like to merge two files according to the content of their 2nd columns.

File 1:

"4742"  "209220_at"     2.60700394801826
"104"   "209396_s_at"   2.60651442103297
"749"   "202409_at"     2.59424724783704
"4168"  "209875_s_at"   2.58773204877464
"3973"  "1431_at"       2.52832098784342
"1826"  "207201_s_at"   2.41685345240968

File2:

"653"   "1431_at"       2.14595534191867
"1109"  "207201_s_at"   2.13777517447307
"353"   "212531_at"     2.12706340284672
"381"   "206535_at"     2.11456707231618
"1846"  "204534_at"     2.10919474441178

To have in the end:

"3973"  "1431_at"       2.52832098784342 "653"   "1431_at"    开发者_Go百科   2.14595534191867
"1826"  "207201_s_at"   2.41685345240968 "1109"  "207201_s_at"   2.13777517447307

I have tried comm, diff, some obscure awk one-liner without any success. Any help much appreciated. Ben

You can do that with a combination of the sort and join commands. The straightforward approach is

join -j2 <(sort -k2 file1) <(sort -k2 file2)

but that displays slightly differently than you're looking for. It just shows the common join field and then the remaining fields from each file

"1431_at" "3973" 2.52832098784342 "653" 2.14595534191867
"207201_s_at" "1826" 2.41685345240968 "1109" 2.13777517447307

If you need the format exactly as you showed, then you would need to tell join to output in that manner

join -o 1.1,1.2,1.3,2.1,2.2,2.3 -j2 <(sort -k2 file1) <(sort -k2 file2)

where -o accepts a list of FILENUM.FIELDNUM specifiers.

Note that the <() syntax I'm using isn't POSIX sh, so you should sort to a temporary file if you need POSIX sh syntax.

awk '
  # store the first file, indexed by col2
  NR==FNR {f1[$2] = $0; next}
  # output only if file1 contains file2's col2
  ($2 in f1) {print f1[$2], $0}
' file1 file2

If the files are small, write a program in a scripting language (Perl, Python and Ruby are all good choices) that reads the first into a hash whose keys are the second column, then read through the second file and use hash lookups to fin out what (if anything) can be joined.

If the files are large then for each file swap the first and second columns, pass them through the unix sort utility, and then in a scripting language merge (plus column reorder) the two sorted files.

awk 'FNR==NR{a[$2]=$0} NR>FNR && ($2 in a){ print $0,a[$2] } ' file2 file1

unix: merge 2 files using 2nd columns

精彩评论

关注公众号

热门标签

图文推荐

unix: merge 2 files using 2nd columns

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：