How to extract words in multiple line in unix?_问答_开发者

How to extract words in multiple line in unix?

开发者 https://www.devze.com 2023-03-12 18:07 出处：网络

I want to extract some specific words from the following string :- Exported Layer : missing_hello Comment :

I want to extract some specific words from the following string :-

Exported Layer : missing_hello  
Comment :   
Total Polygons : 20000 (reported 100).

I want to extract the word "missing_hello" and "2000" from the above string and want to display it as

missing_hello : 20000

How to do开发者_StackOverflow that in unix?

Assuming than missing_hello is everytime one word - you can:

perl -lane '$el=$F[3] if(/Exported Layer/); print "$el: $F[3]" if(/Total Polygons/);'

Take a look at this guide- http://www.grymoire.com/Unix/Sed.html

Sed is certainly a tool worth learning. I would look specifically at the sections titled "Using \1 to keep part of the pattern", and "Working with Multiple Lines".

If you have perl, you could use this:

use strict;
use warnings;

my $layer;
my $polys;

while (<>) {
    if ($_ =~ m{^Exported \s Layer \s : \s (\S+)}xms) {
        $layer = $1;
        next;
    }
    if ($_ =~ m{^Total \s Polygons \s : \s (\d+)}xms) {
        $polys = $1;
    }
    if (defined $layer && defined $polys) {
        print "$layer : $polys\n";
        $layer = $polys = undef;
    }
}

In awk:

awk -F: '/Exported Layer/ { export_layer = $2 }
         /Total Polygons/ { printf("%s : %s\n", export_layer, $2); }' "$@"

If the input is garbage, the output will be too (GIGO). If the fields can contain colons, life gets messier.

In sed:

sed -n -e '/Exported Layer : *\(.*\)/{s//\1 : /;h;}' \
       -e '/Total Polygons : *\(.*\)/{s//\1/;x;G;s/\n//;p;}' "$@"

Colons in fields are not a problem with this sed version.

Now tested on MacOS X 10.6.7. Both scripts include the commentary after the number in the 'Total Polygons' line. Both scripts can fairly easily be revised to only print the number and ignore the commentary. It would help to have a precise definition of all the format possibilities.

I would probably actually use Perl (or Python) to do this job; the field splitting is just messy enough to benefit from the better facilities in those languages.