What is the best way of removing reoccurring lines from a file in Bash?_问答_开发者

Folks,

I have a file that contains ldap entries and I want to remove "version: 1" lines from the second occurrence and on. I know sed can do things like this, but since I am very new, I don't know how to proceed. This is a Solaris 10 machine and the file looks like as follows:

version: 1
dn: uid=tuser1,ou=people,o=example.com,o=isp
cn: tuser1
uidNumber: 3
gidNumber: 3
homeDirectory: /export/home/tuser1
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser1
shadowLastChange:开发者_如何学JAVA
userPassword:

version: 1
dn: uid=tuser2,ou=people,o=example.com,o=isp
uidNumber: 20
cn: tuser1
gidNumber: 3
homeDirectory: /export/home/tuser2
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser1
shadowLastChange:
userPassword: 

version: 1
dn: uid=tuser3,ou=people,o=example.com,o=isp
uidNumber: 10
cn: tuser3
gidNumber: 3
homeDirectory: /export/home/tuser3
loginShell: /bin/sh
objectClass: posixAccount
objectClass: shadowAccount
objectClass: account
objectClass: top
uid: tuser3
shadowLastChange:
userPassword: 

version: 1
dn: uid=loperp,ou=people,o=example.com,o=isp
uid: loperp
userPassword:
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top
sn: pop
cn: loper

version: 1
dn: uid=tuser4,ou=people,o=example.com,o=isp
userPassword: 
uid: tuser4
objectClass: inetOrgPerson
objectClass: organizationalPerson
objectClass: person
objectClass: top
sn: User4
cn: Test

With GNU sed

sed -ni '0,/version: 1/{p; d}; /version: 1/!p' ldap.txt

EDIT: This was initially wrong. When the first line wasn't version, it printed duplicates.

The GNU version is simpler. It prints (p) from the beginning until the first line matching the version regex, both inclusive. Also, for each line in that range, after printing we delete the pattern space and start a new cycle (d). Basically, this means go to the beginning of the script and to the next line (this avoids double printing). Unlike (standard) 1,/regex/, if the first line matches, it will not continue to another matching line.

If we haven't d'ed (so we're after the first version: 1), we then simply print every line that doesn't match the regex (!).

With standard sed):

sed -ni 'p; /version: 1/ b nov; d; :nov /version: 1/!p; n; b nov' ldap.txt

This begins by simply printing every line (p). After that print, if we match the regex, we branch to the nov (no version) label; the label name is up to us. If we do not branch, we (d) delete the pattern space and start a new cycle (newline, beginning of script). In nov, we print the line if it does not match (same as GNU). We then go to a new line, and branch back to nov. This loop continues until the end.

I (Jonathan Leffler) can confirm @kuti's observations on Solaris 10 standard 'sed'; what works is:

/bin/sed -n 'p
/version: 1/ b nov
d
:nov
/version: 1/!p
n
b nov' ldap.txt

The 'semi-colons in lieu of newlines' trick does not seem to work universally with Solaris 'sed'. Specifically, at the least, there cannot be a semi-colon after any use of a label.

This seems to work:

/bin/sed -n 'p; /version: 1/ b nov
d; :nov
/version: 1/!p; n; b nov' ldap.txt

(I can't think how to present the fix in a comment - the multiline formatting is crucial here.)

A simple answer uses awk:

awk '{ if ($0 ~ /^version: 1$/) { if (count++ == 0) print; }
       else print;
     }'

This assumes that you really mean you want only the first 'version: 1' line and don't mind keeping multiple 'version: 2' lines, etc.

here's another awk version

awk '/version: 1/{c++}c>1{gsub("version: 1","")}1' file

Using man 1 ed we can mark the line containing the first match and increment it by 1 to get:

#  'm+1,$  
#  ... which creates a line address space of:  
#  /first line matched + 1/,/last line/

# http://wiki.bash-hackers.org/doku.php?id=howto:edit-ed
[[ $(grep -c -m 1 '^version: 1' file) -eq 1 ]] && \
cat <<-'EOF' | sed -e 's/^ *//' -e 's/ *$//' | ed -s file
   H
  /^version: 1/km
  'm+1,$g/^version: 1/d
  wq
EOF