开发者

deleting selected lines from data file

开发者 https://www.devze.com 2023-04-03 05:19 出处:网络
This question is continuation from my earlier post titled \"selecting digits from regular expression\".

This question is continuation from my earlier post titled "selecting digits from regular expression".

Below is the sample data as posted in the earlier post.

          DONOR         ACCEPTORH      ACCEPTOR           
    atom#  res@atom   atom#  res@atom atom#  res@atom %occupied  distance       angle        
  |  4726   59@O12 |  1487    19@H12  1486    19@O12 |  85.66  2.819 ( 0.18)  21.85 (12.11)        
  |  1499   19@O15 |  1730    22@H12  1729    22@O12 |  83.15  3.190 ( 0.31)  22.36 (12.73)        
  |  1216   16@O22 |  1460    19@H22  1459    19@O22 |  75.74  2.757 ( 0.14)  24.55 (13.66)        
  |  4232   53@O25 |  4143    52@H24  4142    52@O24 |  74.35  2.916 ( 0.25)  28.27 (13.26)        
  |  3683   46@O16 |  4163    52@H13  4162    52@O13 |  73.78  2.963 ( 0.29)  23.65 (14.14)        
  |  4162   52@O13 |  4079    51@H12  4078开发者_高级运维    51@O12 |  73.68  2.841 ( 0.19)  21.25 (11.87)        
  |  3764   47@O16 |  3825    48@H26  3824    48@O26 |  70.52  2.973 ( 0.28)  26.88 (13.14)        
  .
  .
  The lines goes few thousands.

I tired Fredirk's code and it works fine for selecting the lines. Well, now I would like to extend this idea to my real problem.

The $3 (3rd field) and $6 (6th field) in my data file represent "number-molecule" which has arrangement as below:

   1    2   3   4   5   6   7       8

   9    10  11  12  13  14  15      16
  17    18  19  20  21  22  23      24
  25    26  27  28  29  30  31      32
  33    34  35  36  37  38  39      40
  41    42  43  44  45  46  47      48
  49    50  51  52  53  54  55      56

  57    58  59  60  61  62  63      64 

Any pairs made from above numbers actually represents pairs in the 3rd and 6th field of each line in the data file.

What I want is to select the pairs made only by numbers which arranged at the outer most lines of the above ordering.

 In short, ANY PAIRS made by only the numbers  (1 2 3 4 5 6 7 8   57 58 59 60 61 62 63 64   1 9 17 25 33 41 49 57   8 16 24 32 40 48 56 64) are need to be deleted.

I have no idea how to write loop in awk code to select those pairs and delete the lines straight away.

I wish to say many thanks in advance.


Use an array to hold the set of numbers. Define it in the BEGIN block

BEGIN {
  i=0
  for (n=1; n<=8; n++) set[i++] = n
  for (n=57; n<=64; n++) set[i++] = n
  for (n=9; n<=49; n+=8) {set[i++] = n; set[i++] = n+7}
}

Then, check that $3 and $6 are both in (or not in) the set:

($3 in set) && ($6 in set) {next}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号