开发者

Awk filtering with multiple fields

开发者 https://www.devze.com 2023-03-04 07:05 出处:网络
Suppose that I have the following text file (it can have more states, cities, and colleges: begin_state

Suppose that I have the following text file (it can have more states, cities, and colleges:

begin_state
New York
end_state

begin_cities
Albany
Buffalo
Syracuse
end_cities

begin_colleges
Cornell
Columbia
Stony Brook
end_colleges

begin_state
California
end_ state

begin_cities
San Francisco
Sacramento
Los Angeles
end cities

begin_colleges
Berkeley
Stanford
Caltech
end_colleges

I want to use awk to filter all开发者_如何学编程 the cities and list them under the states or select all the colleges and list them under the states: For example--if I want the cities, they should be output as follows.

**New York**
Albany
Buffalo
Syracuse
**California**
San Francisco
Sacramento
Los Angeles

Any suggestions are welcome.


Here are two solutions in awk. The first is naive and repetitive but easier to follow and learn from. The later one is an attempt at reducing the repetition.

Both solutions are fragile with respect to handling errors in your data file. If you are free to choose the implementation language I suggest you do this in something like ruby, perl or python.

Save to a file (e.g. showinfo.sh) and invoke with a single argument: "cities" or "colleges", to determine the mode. Also you must redirect the data file into stdin.

Example invocation (for either solution):

./showinfo.sh cities < states.txt
./showinfo.sh colleges < states.txt

The naive solution:

#!/bin/bash
set -e
set -u
#mode=cities
mode=$1

awk -v mode=$mode '
/begin_state/    {st="states"; next} 
/end_state/      {next} 
/begin_cities/   {st="cities"; next} 
/end_cities/     {next} 
/begin_colleges/ {st="coll"; next} 
/end_colleges/   {next} 

{ 
  if (st=="states") {
    sn=$0; 
  }
  else 
    if (st=="cities") cities[sn]=cities[sn]"\n"$0
    else if (st=="coll") colleges[sn]=colleges[sn]"\n"$0; 
} 

END {
  if (mode=="cities") {
    for (sn in cities) { print "=="sn"=="cities[sn] } ; 
  } 
  else if (mode=="colleges") {
    for (sn in colleges) { print "=="sn"=="colleges[sn] } ; 
  } 
  else { print "set mode either cities or colleges" }
}'

Second solution, with repetition removed:

#!/bin/bash
set -e
set -u
mode=$1
awk -v mode=$mode '
/begin_/    {st=$1; next} 
/end_/      {st=""; next} 

{ 
  if (st=="begin_state") { sn=$0 }
  else { data[st, sn]=data[st, sn]"\n"$0 }
} 

END {
  for (combo in data) {
    split(combo, sep, SUBSEP);
    type = sep[1];
    state_name = sep[2];
    if (type == "begin_"mode) {
      print "==" state_name "==" data[combo];
    }
  }
}'

Input file used (as I note it has changed recently in the question):

begin_state
New York
end_state
begin_cities
Albany
Buffalo
Syracuse
end_cities
begin_colleges
Cornell
Columbia
Stony Brook
end_colleges
begin_state
California
end_state
begin_cities
San Francisco
Sacramento
Los Angeles
end_cities
begin_colleges
Berkeley
Stanford
Caltech
end_colleges

Session when running the first solution:

$ bash showinfo.sh cities < states.txt 
==New York==
Albany
Buffalo
Syracuse
==California==
San Francisco
Sacramento
Los Angeles
0

精彩评论

暂无评论...
验证码 换一张
取 消