AWK/BASH: how to match a field in one file from a field in another?_问答_开发者

AWK/BASH: how to match a field in one file from a field in another?

开发者 https://www.devze.com 2023-01-20 14:01 出处：网络

I have 2 files, the first contains the following: ... John Allen Smith II 16 555-555-5555 10/24/2010 John Allen Smith II 3 555-555-5555 10/24/2010

相关专题：bash file shell

I have 2 files, the first contains the following:

...
John Allen Smith II 16 555-555-5555 10/24/2010
John Allen Smith II 3 555-555-5555 10/24/2010
John Allen Smith II 17 555-555-5555 10/24/2010
John Doe 16 555-555-5555 10/24/2010
Jane Smith 16 555-555-5555 9/16/201开发者_C百科0
Jane Smith 00 555-555-5555 10/24/2010
...

and the second file is a list of names so...

...
John Allen Smith II
John Doe
Jane Smith
...

Is it possible to use awk (or other bash command) to print the lines in the first file that match any name in the second file (the names can repeat in the first file)

Bonus? Is there an easy way to remove those repeated/duplicate lines in the first file?

Thanks very much,

Tomek

awk

#! /bin/bash
awk 'FNR==NR{!a[$0]++;next }{ b[$0]++ }
END{
  for(i in a){
    for(k in b){
      if (a[i]==1 && i ~ k ) { print i }
    }
  }
}' file1 file2

expanding on codaddict's answer:

grep -f file2 file1 | sort | uniq

this will remove lines that are exactly the same, but the side effect (which may be unwanted) is that your datafile will now be sorted. It also requires the lines to be exactly the same, which is not the case in your example data. The names are the same, but the data after those same names is different. uniq can take a field or character count option, but this won't work on your data because your names have variable length and a variable number of fields. If you know your data fields are always the last 3 fields on a line, then you can do this:

grep -f file2 file1 | sort | rev | uniq -f 3 | rev

your output will be only one of each name, but which one? the lowest one lexicographically because it was sorted (sort is needed for uniq to work right). If you don't want to sort it first, or need to be careful about which of the lines are dropped, then an awk or perl or ruby or python solution will probably work best using associative arrays.

You can use grep as:

grep -f file2 file1   # file2 is the file with the names.

The -f option of grep obtains the pattern to be search for from the file.

To remove exact duplicate lines from the output you can use sort as:

grep -f file2 file1 | sort -u

AWK/BASH: how to match a field in one file from a field in another?

精彩评论

关注公众号

热门标签

图文推荐

AWK/BASH: how to match a field in one file from a field in another?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：