I have text like:
[100 ps] bar [139 ps] foo de fa fa [145 ps] foo de fa fa [147 ps] foo de fa fa [149 ps] le pamplemouse [150 ps] le pamplemouse [177 ps] le pomme de terre [178 ps] le pomme de terre
In awk I want to filter out all the lines where the right half of the line matches the right half of the previous line. i.e. the uniquify lines as if there is no time stamp. So I'd nix:
[100 ps] bar [139 ps] foo de fa fa [开发者_开发问答145 ps] foo de fa fa <-- Nuked [147 ps] foo de fa fa <-- Nuked [149 ps] le pamplemouse [150 ps] le pamplemouse <-- Nuked [177 ps] le pomme de terre [178 ps] le pomme de terre <-- Nuked
To give me an output of:
[100 ps] bar [139 ps] foo de fa fa [149 ps] le pamplemouse [177 ps] le pomme de terre
How can this be done?
EDIT: Sorry, I wasn't as clear as I should have been. The left half of the string is a time stamp with a constant number of tokens, but the right half will have many tokens. In general, can I create arbitrary memory groupings like:
(regex1)(regex2)
Then compare $2
, where $2
is the part of the line that matches regex2
?
Running on ideone:
BEGIN {prev=""}
$3==prev {next}
{ prev = $3;
print;}
You could use associative arrays to maintain a counter for each key on the right side.
This is a proof of a concept one liner that you can use as a starting point
$ echo "[100 ps] bar\n[139 ps] foo\n[140 ps] foo" |
awk '{count[$3]++; if (count[$3] == 1) print;}'
[100 ps] bar
[139 ps] foo
This would have to be tweaked if the right side string can contain spaces.
what separates the right half from the left half? Is it a tab or multiple spaces? If it's a tab then:
awk -F '\t' '
$2 in seen {next}
{ print; seen[$2]=1 }
'
Otherwise, I'd write something like
perl -ane '
$right_half = join " ", @F[2..-1];
if (not $seen{$right_half}) {
print;
$seen{$right_half} = 1;
}
'
$ awk -F"][ \t]+" '!a[$2]++' file
[100 ps] bar
[139 ps] foo de fa fa
[149 ps] le pamplemouse
[177 ps] le pomme de terre
精彩评论