开发者

Unix - Need to cut a file which has multiple blanks as delimiter - awk or cut?

开发者 https://www.devze.com 2023-01-29 06:16 出处:网络
I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example: 2U21331239

I need to get the records from a text file in Unix. The delimiter is multiple blanks. For example:

2U2133   1239  
1290fsdsf   3234

From this, I need to extract

1239  
3234

The delimiter for all records will be always 3 blanks.

I need to do this in an unix script(.scr) and write the output to another file or use it as an input to a do-while loop. I tried the below:

while read readline  
do  
        read_int=`echo "$readline"`  
        cnt_exc=`grep "$read_int" ${Directory path}/file1.txt| wc -l`  
if [ $cnt_exc -gt 0 ]  
then  
  int_1=0  
else  
  int_2=0  
fi  
done < awk -F'  ' '{ print $2 }' ${Directoty path}/test_file.txt  

test_file.txt is the input file and file1.txt is a lookup file. But the above way is not working and giving me syntax errors near awk -F

I tried writing the output to a file.开发者_开发百科 The following worked in command line:

more test_file.txt | awk -F'   ' '{ print $2 }' > output.txt

This is working and writing the records to output.txt in command line. But the same command does not work in the unix script (It is a .scr file)

Please let me know where I am going wrong and how I can resolve this.

Thanks,

Visakh


The job of replacing multiple delimiters with just one is left to tr:

cat <file_name> | tr -s ' ' | cut -d ' ' -f 2

tr translates or deletes characters, and is perfectly suited to prepare your data for cut to work properly.

The manual states:

-s, --squeeze-repeats
          replace each sequence  of  a  repeated  character  that  is
          listed  in the last specified SET, with a single occurrence
          of that character


It depends on the version or implementation of cut on your machine. Some versions support an option, usually -i, that means 'ignore blank fields' or, equivalently, allow multiple separators between fields. If that's supported, use:

cut -i -d' ' -f 2 data.file

If not (and it is not universal — and maybe not even widespread, since neither GNU nor MacOS X have the option), then using awk is better and more portable.

You need to pipe the output of awk into your loop, though:

awk -F' ' '{print $2}' ${Directory_path}/test_file.txt |
while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done

The only residual issue is whether the while loop is in a sub-shell and and therefore not modifying your main shell scripts variables, just its own copy of those variables.

With bash, you can use process substitution:

while read readline  
do  
    read_int=`echo "$readline"`  
    cnt_exc=`grep "$read_int" ${Directory_path}/file1.txt| wc -l`  
    if [ $cnt_exc -gt 0 ]  
    then int_1=0  
    else int_2=0
    fi  
done < <(awk -F' ' '{print $2}' ${Directory_path}/test_file.txt)

This leaves the while loop in the current shell, but arranges for the output of the command to appear as if from a file.

The blank in ${Directory path} is not normally legal — unless it is another Bash feature I've missed out on; you also had a typo (Directoty) in one place.


Other ways of doing the same thing aside, the error in your program is this: You cannot redirect from (<) the output of another program. Turn your script around and use a pipe like this:

awk -F'   ' '{ print $2 }' ${Directory path}/test_file.txt | while read readline

etc.

Besides, the use of "readline" as a variable name may or may not get you into problems.


In this particular case, you can use the following line

sed 's/   /\t/g' <file_name> | cut -f 2

to get your second columns.


In bash you can start from something like this:

for n in `${Directoty path}/test_file.txt | cut -d " " -f 4`
{
    grep -c $n ${Directory path}/file*.txt
}


This should have been a comment, but since I cannot comment yet, I am adding this here. This is from an excellent answer here: https://stackoverflow.com/a/4483833/3138875

tr -s ' ' <text.txt | cut -d ' ' -f4

tr -s '<character>' squeezes multiple repeated instances of <character> into one.


It's not working in the script because of the typo in "Directo*t*y path" (last line of your script).


Cut isn't flexible enough. I usually use Perl for that:

cat file.txt | perl -F'   ' -e 'print $F[1]."\n"'

Instead of a triple space after -F you can put any Perl regular expression. You access fields as $F[n], where n is the field number (counting starts at zero). This way there is no need to sed or tr.

0

精彩评论

暂无评论...
验证码 换一张
取 消