开发者

Remove lines with duplicate cells

开发者 https://www.devze.com 2023-03-20 05:26 出处:网络
I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain \"Value04\" - I cannot remove all lines containing Value03 because ther

I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain "Value04" - I cannot remove all lines containing Value03 because there are lines with that data that are NOT duplicates and must be kept. I can use any editor; excel, vim, any other Linux command lines.

In the end there should be no duplicate "UserX" values. User1 should only appear 1 time. But if User1 exists开发者_Python百科 twice, I need to remove the entire line containing "Value04" and keep the one with "Value03"

Value01,Value03,User1
Value02,Value04,User1
Value01,Value03,User2
Value02,Value04,User2
Value01,Value03,User3
Value01,Value03,User4

Your ideas and thoughts are greatly appreciated.

Edit: For clarity and leaving words out from the editing process.


The following Awk command removes all but the first occurrence of a value in the third column:

$ awk -F',' '{
  if (!seen[$3]) {
    seen[$3] = 1
    print
   }
}' textfile.txt

Output:

Value01,Value03,User1
Value01,Value03,User2
Value01,Value03,User3
Value01,Value03,User4


same thing in Perl:

perl -F, -nae 'print unless $c{$F[2]}++;' textfile.txt 

this uses autosplit mode: "-F, -a" splits by comma and places the result into @F array

0

精彩评论

暂无评论...
验证码 换一张
取 消