Filtering rows based on column values in Bash

开发者 https://www.devze.com 2023-02-26 20:31 出处：网络

I\'ve got a bash script which outputs some column-based information. I\'d like to give the user some options for matching values is specific columns. For example, ./myColumnDump might print

I've got a bash script which outputs some column-based information. I'd like to give the user some options for matching values is specific columns. For example, ./myColumnDump might print

User Job开发者_C百科_name Start_day
andrew job1_id monday
andrew job2_id tuesday
adam job1_id tuesday
adam job2_id monday

and I'd like to add options like ./myColumDump -j 2 (where's j's argument is a regular expression used which matches values in the Job_name column).

I'm currently piping the output through grep and embedding the user-specified regex's in a big regex to match a whole row, but the he/she might specify -j .*monday which would spill into the different column.

Is there a nicer way to achieve this in a bash script?

This problem is tailor made for awk(1). For example, you can do this:

awk '$2 ~ /^job1/'

to print out lines where column two matches ^job1. So, given a column number in N and a regular expression in R, you should be able to do this:

awk "\$${N} ~ /${R}/"

You will, as usual, need to be careful with your quoting.

Here is the complete bash script scan.sh to do your job:

#!/bin/bash
usage()
{
cat << EOF
usage: $0 options
This script scans given input file for specified regex in the input column #   
OPTIONS:
   -h      Show usage instructions
   -f      input data file name
   -r      regular expression to match
   -j      column number
EOF
}   
# process inputs to the script
DATA_FILE=
COL_NUM=
REG_EX=
while getopts ":j:f:r:h" OPTION
do
     case $OPTION in
         f) DATA_FILE="$OPTARG" ;;
         r) REG_EX="$OPTARG" ;;
         j) COL_NUM="$OPTARG" ;;
         \?) usage
             exit 1 ;;
         h)
             usage
             exit 1 ;;
     esac
done   
if [[ -z $DATA_FILE ]] || [[ -z $COL_NUM ]] || [[ -z $REG_EX ]]
then
     usage
     exit 1
fi

awk -v J=${COL_NUM} -v R="${REG_EX}" '{if (match($J, R)) print $0;}' "${DATA_FILE}"

TESTING

Let's say this is your data file: User Job_name Start_day

andrew job1_id monday
andrew job2_id tuesday
adam job1_id tuesday
adam job2_id monday

./scan.sh -j 2 -f data  -r ".*job1.*"
andrew job1_id monday
adam job1_id tuesday

./scan.sh -j 2 -f data  -r ".*job2.*"
andrew job2_id monday
adam job2_id tuesday

./scan.sh -j 1 -f data  -r ".*adam.*"
adam job1_id tuesday
adam job2_id monday

To build on mu is too short's answer, you can pass the user's pattern to awk:

# suppose the -j pattern is in shell var $j
awk -v j="$j" '$2 ~ j'

Have to advise users to enter a regex pattern that awk understands though.

Here's a pure bash script (courtesy anubhava)

#!/bin/bash
# tested on bash 4
usage()
{
cat << EOF
usage: $0 options [file]
This script scans given input file for specified regex in the input column #
OPTIONS:
   -h      Show usage instructions
   -f      input data file name
   -r      regular expression to match
   -j      column number

Example:  $0 -j 2 -r "job2" -f file
EOF
}
# process inputs to the script
DATA_FILE=
COL_NUM=
REG_EX=
while getopts ":j:f:r:h" OPTION
do
     case $OPTION in
         f) DATA_FILE="$OPTARG" ;;
         r) REG_EX="$OPTARG" ;;
         j) COL_NUM="$OPTARG" ;;
         \?) usage
             exit 1 ;;
         h)
             usage
             exit 1 ;;
     esac
done
if [[ -z $DATA_FILE ]] || [[ -z $COL_NUM ]] || [[ -z $REG_EX ]]
then
     usage
     exit 1
fi
while read -r line
do
    array=( $line )
    col=${array[$((COL_NUM-1))]}
    [[ $col =~ $REG_EX ]] && echo "$line"
done < $DATA_FILE