I need to remove lines with a duplicate value. For example I need to remove line 1 and 3 in the block below because they contain "Value04" - I cannot remove all lines containing Value03 because there are lines with that data that are NOT duplicates and must be kept. I can use any editor; excel, vim, any other Linux command lines.
In the end there should be no duplicate "UserX" values. User1 should only appear 1 time. But if User1 exists开发者_Python百科 twice, I need to remove the entire line containing "Value04" and keep the one with "Value03"
Value01,Value03,User1
Value02,Value04,User1
Value01,Value03,User2
Value02,Value04,User2
Value01,Value03,User3
Value01,Value03,User4
Your ideas and thoughts are greatly appreciated.
Edit: For clarity and leaving words out from the editing process.
The following Awk command removes all but the first occurrence of a value in the third column:
$ awk -F',' '{
if (!seen[$3]) {
seen[$3] = 1
print
}
}' textfile.txt
Output:
Value01,Value03,User1
Value01,Value03,User2
Value01,Value03,User3
Value01,Value03,User4
same thing in Perl:
perl -F, -nae 'print unless $c{$F[2]}++;' textfile.txt
this uses autosplit mode: "-F, -a" splits by comma and places the result into @F array
精彩评论