How do I remove part of a line in a multi-line chunk using sed or Perl?_问答_开发者

How do I remove part of a line in a multi-line chunk using sed or Perl?

开发者 https://www.devze.com 2023-02-07 12:40 出处：网络

I have some data that looks like this. It comes in chunk of four. Each chunk starts with a @ character.

相关专题：perl sed

I have some data that looks like this. It comes in chunk of four. Each chunk starts with a @ character.

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:开发者_运维问答1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
888888888888888888888888888

At the third line of each chunk, I want to remove the text that comes after the + character, resulting in:

@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888

Is there a compact way to do that in sed or Perl?

Assuming you just don't want to blindly remove the rest of every line starting with a +, then you can do this:

sed '/^@/{N;N;s/\n+.*/\n+/}' infile

Output

$ sed '/^@/{N;N;s/\n+.*/\n+/}' infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me

*Note: Although the above command keys on the @ to determine if a line with a + should be altered, it will still alter the 2nd line if it happens to also start with a +. It doesn't sound like this is the case, but if you want to exclude this corner case as well, the following minor alteration will protect against that:

sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' infile

Output

$ sed '/^@/{N;N;s/\(.*\)\n+.*/\1\n+/}' ./infile
@SRR037212.1 FC30L5TAA_102708:7:1:741:1355 length=27
+AAAAAAAAAAAAAAAAAAAAAAAAAAA
+
::::::::::::::::::::::::;;8
@SRR037212.2 FC30L5TAA_102708:7:1:1045:1765 length=27
TATAACCAGAAAGTTACAAGTAAACAC
+
888888888888888888888888888
+Dont remove me

If there is never a + on the first or second lines and always one on the third line:

perl -0100pi -e's/\+.*/+/' datafile

Otherwise:

perl -0100pi -e's/^((?:.*\n){2}.*?\+).*/$1/' datafile

or on 5.10+:

perl -0100pi -e's/^(?:.*\n){2}.*?\+\K.*//' datafile

All those assume @ only appears at the start of a chunk. If it may appear other places, then:

perl -pi -e's/\+.*/+/ if $. % 4 == 3' datafile

If you can use awk, you can do:

 gawk '{if ($0 ~ /^@/ ) { print ; getline ; print ; getline ; print "+"}}' INPUTFILE

So if gawk sees an @ at the start of the line, it will be printed, then the next line will be slurped && printed, and finally slurping the 3rd line (after the @), and printing only the +.

If the + is not on the start of the line, you can use gensub(/\+.*/,"+",$0) instead of the "+" in the last print.

(And if you have perl installed, most probably there will be an a2p executable, which can convert the above awk script to perl, if you want to...)

HTH

UPDATE (on missing 4th line):

 gawk '{if ($0 ~ /^@/ ) { print ; getline ; print ; getline ; print "+"; getline; print }}' INPUTFILE

This should print the 4th line as well.

maybe just sed '/^@/+2 s/+.*/+/'

edit: this will not work, but as a vim command it should work:

vim file -c ':g/^@/+2s/+.*/+/' -c 'wq'

This might work for you:

sed '/^@/{$!N;$!N;$!N;s/\n+[^\n]*/\n+/g}' file

or with GNU sed:

sed '/^@/,+3s/^+.*/+/' file

How do I remove part of a line in a multi-line chunk using sed or Perl?

Output

Output

精彩评论

关注公众号

热门标签

图文推荐

How do I remove part of a line in a multi-line chunk using sed or Perl?

Output

Output

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：