开发者

Change column according to previous line with conditions

开发者 https://www.devze.com 2023-03-31 14:53 出处:网络
I have files with the format: ATOM3736CBTHR A 486-6.552 153.891-7.9221.00115.15C ATOM3737OG1 THR A 486-6.756 154.842-6.8661.00114.94O

I have files with the format:

ATOM   3736  CB  THR A 486      -6.552 153.891  -7.922  1.00115.15           C  
ATOM   3737  OG1 THR A 486      -6.756 154.842  -6.866  1.00114.94           O  
ATOM   3738  CG2 THR A 486      -7.867 153.727  -8.636  1.00115.11           C  
ATOM   3739  OXT THR A 486      -4.978 151.257  -9.140  1.00115.13           O  
HETATM10351  C1  NAG B 203      33.671  87.279  39.456  0.50 90.22           C  
HETATM10483  C1  NAG Z 702      28.025 104.269 -27.569  0.50 92.75           C    
ATOM   3736  CB  THR X 486      -6.552  86.240   7.922  1.00115.15           C  
ATOM   3737  OG1 THR X 486      -6.756  85.289   6.866  1.00114.94           O  
ATOM   3738  CG2 THR X 486      -7.867  86.404   8.636  1.00115.11           C  
ATOM   3739  OXT THR X 486      -4.978  88.874   9.140  1.00115.13           O  
HETATM10351  C1  NAG Y 203      33.671 152.852 -39.456  0.50 90.22           C  开发者_JAVA技巧
HETATM10639  C2  FUC C 402     -48.168 162.221 -22.404  0.50103.03           C  

For each block of lines starting with HETATM*, I would like to change column 5 to match that of the previous ATOM block. It means that for the first HETATM* block both B and Z will change to A, whereas for the second HETATM* block both Y and C will change to X.

A second question, I do not really need to do it, it is just out of curiosity, how would I split the file after each line starting with HETATM* but only if the next line is ATOM?


Try this:

awk '{
  if( $1 == "ATOM" ) {
    col5=$5;
  } 
  else if( match($1,/HETATM[0-9]*/)) {
    $5=col5;
  }
  print 
}' < infile


awk '$1=="ATOM"{c=$5}/^HETATM/{ $5=c };1' file

To preserve space, use field separator

awk -F" " '/^ATOM/{c=$5}/^HETATM/{ $5=c };1' file


Here is my solution, which solves the first problem (replacing the fifth field) while preserving white spaces:

$1=="ATOM" {
    fifthField=$5

    # Block to determine which index position field #5 is
    fifthField_index = 1
    for (i = 0; i < 4; i++) {
        // Skip until white space
        for (; substr($0, fifthField_index, 1) != " "; fifthField_index++) { }
        // Skip white spaces
        for (; substr($0, fifthField_index, 1) == " "; fifthField_index++) { }
    }

    print;next
}

/^HETATM/ {
    before_fifthField = substr($0, 1, fifthField_index - 1)
    after_fifthField = substr($0, fifthField_index + 1, length($0))
    print before_fifthField fifthField after_fifthField
    next
}

1

It is not the most elegant solution, but it works. This solution assumes that the fifth field is a single character.

0

精彩评论

暂无评论...
验证码 换一张
取 消