开发者

awk how to remove duplicates in a field except for some specific strings

开发者 https://www.devze.com 2023-01-29 08:13 出处:网络
This is the structure of my csv file: OsloCompany1Mission1 OsloCompany1Mission2 OsloCompany3Missionspecial

This is the structure of my csv file:

Oslo        Company1           Mission1
Oslo        Company1           Mission2 
Oslo        Company3           Missionspecial 
Oslo        Companyspecial     Missionspecial
Paris       Company2           Mission1
Paris       Companyspecial     Mission2 
Paris       Company3           Missionspecial

I want to delete all duplicates in fields 1,2,3 and replace them with blanks, except for those special strings "Companyspecial" "Missionspecial" so that the output is:

Oslo        Company1             Mission1
                                 Mission2
            Company3             Missionspecial
            Companyspecial       Missionspecial
Paris       Company2             
            Companyspecial       
                                 Missionspecial

All I know to do is remove all duplicate开发者_JS百科s with this bit of code:

x[$1]++ {$1=""}x[$2]++ {$2=""}x[$3]++ {$3=""}){print $1,$2,$3,et.....}

I'm no programmer. Help would be greatly appreciated, will save hours of stupid slave work! Thank you much in advance!``


awk '{
  for(i=1;i<=3;i++)
    if($i !~ /(Mission|Company)special/)
      if(a[i,$i]++)
        $i=""
  printf("%-12s%-19s%-s\n",$1,$2,$3)
}'

Proof of concept HERE

Edit

Updated code to reflect concerns about one field's text potentially removing another. I accomplish this by changing a[$i]++ to a[i,$i]++ so that each field's text is also tied to the field number.

0

精彩评论

暂无评论...
验证码 换一张
取 消