开发者

How can I use awk to print a line only if its right half of line _doesn't_ match the previous line's right half?

开发者 https://www.devze.com 2023-01-22 21:30 出处:网络
I have text like: [100 ps]bar [139 ps]foo de fa fa [145 ps]foo de fa fa [147 ps]foo de fa fa [149 ps]le pamplemouse

I have text like:

[100 ps]  bar
[139 ps]  foo de fa fa
[145 ps]  foo de fa fa
[147 ps]  foo de fa fa
[149 ps]  le pamplemouse
[150 ps]  le pamplemouse
[177 ps]  le pomme de terre
[178 ps]  le pomme de terre

In awk I want to filter out all the lines where the right half of the line matches the right half of the previous line. i.e. the uniquify lines as if there is no time stamp. So I'd nix:

    [100 ps]  bar
    [139 ps]  foo de fa fa
    [开发者_开发问答145 ps]  foo de fa fa  <-- Nuked
    [147 ps]  foo de fa fa  <-- Nuked
    [149 ps]  le pamplemouse
    [150 ps]  le pamplemouse <-- Nuked
    [177 ps]  le pomme de terre 
    [178 ps]  le pomme de terre <-- Nuked

To give me an output of:

    [100 ps]  bar
    [139 ps]  foo de fa fa
    [149 ps]  le pamplemouse
    [177 ps]  le pomme de terre

How can this be done?

EDIT: Sorry, I wasn't as clear as I should have been. The left half of the string is a time stamp with a constant number of tokens, but the right half will have many tokens. In general, can I create arbitrary memory groupings like:

(regex1)(regex2)

Then compare $2, where $2 is the part of the line that matches regex2?


Running on ideone:

 BEGIN {prev=""}

 $3==prev {next}

{ prev = $3;
 print;}


You could use associative arrays to maintain a counter for each key on the right side.

This is a proof of a concept one liner that you can use as a starting point

$ echo "[100 ps] bar\n[139 ps] foo\n[140 ps] foo" |
  awk '{count[$3]++; if (count[$3] == 1) print;}'
[100 ps] bar
[139 ps] foo

This would have to be tweaked if the right side string can contain spaces.


what separates the right half from the left half? Is it a tab or multiple spaces? If it's a tab then:

awk -F '\t' '
    $2 in seen {next} 
    { print; seen[$2]=1 }
'

Otherwise, I'd write something like

perl -ane '
    $right_half = join " ", @F[2..-1];
    if (not $seen{$right_half}) {
        print;
        $seen{$right_half} = 1;
    }
'


$ awk -F"][ \t]+" '!a[$2]++' file
[100 ps]  bar
[139 ps]  foo de fa fa
[149 ps]  le pamplemouse
[177 ps]  le pomme de terre
0

精彩评论

暂无评论...
验证码 换一张
取 消