开发者

AWK: is there some flag to ignore comments?

开发者 https://www.devze.com 2022-12-27 13:26 出处:网络
Comment rows are counted in the NR. Is there some flag to ignore comments? How can you limit the range in AWK, not like piping | sed -e \'1d\', to ignore comment rows?

Comment rows are counted in the NR.

  1. Is there some flag to ignore comments?
  2. How can you limit the range in AWK, not like piping | sed -e '1d', to ignore comment rows?

Example

$ awk '{sum+=$3} END {avg=sum/NR} END {print avg}' coriolis_data
0.885491                          // WRON开发者_StackOverflowG divided by 11, should be by 10
$ cat coriolis_data 
#d-err-t-err-d2-err
.105    0.005   0.9766  0.0001  0.595   0.005
.095    0.005   0.9963  0.0001  0.595   0.005
.115    0.005   0.9687  0.0001  0.595   0.005
.105    0.005   0.9693  0.0001  0.595   0.005
.095    0.005   0.9798  0.0001  0.595   0.005
.105    0.005   0.9798  0.0001  0.595   0.005
.095    0.005   0.9711  0.0001  0.595   0.005
.110    0.005   0.9640  0.0001  0.595   0.005
.105    0.005   0.9704  0.0001  0.595   0.005
.090    0.005   0.9644  0.0001  0.595   0.005


it is best not to touch NR , use a different variable for counting the rows. This version skips comments as well as blank lines.

$ awk '!/^[ \t]*#/&&NF{sum+=$3;++d}END{ave=sum/d;print ave}' file
0.97404


Just decrement NR yourself on comment lines:

 awk '/^[[:space:]]*#/ { NR-- } {sum+=$3} END { ... }' coriolis_data

Okay, that did answer the question you asked, but the question you really meant:

 awk '{ if ($0 ~ /^[[:space:]]*#/) {NR--} else {sum+=$3} END { ... }' coriolis_data

(It's more awk-ish to use patterns outside the blocks as in the first answer, but to do it that way, you'd have to write your comment pattern twice.)

Edit: Will suggests in the comments using /.../ {NR--; next} to avoid having the if-else block. My thought is that this looks cleaner when you have more complex actions for the matching records, but doesn't matter too much for something this simple. Take your favorite!


Another approach is to use a conditional statement...

awk '{ if( $1 != "#" ){ print $0 } }' coriolis_data

What this does is tell awk to skip lines whose first entry is #. Of course this requires the comment charactter # to stand alone at the beginning of a comment.


There is a SIMPLER way to do it!

$ awk '!/#/ {print $0}' coriolis_data
.105 0.005 0.9766 0.0001 0.595 0.005
.095 0.005 0.9963 0.0001 0.595 0.005
.115 0.005 0.9687 0.0001 0.595 0.005
.105 0.005 0.9693 0.0001 0.595 0.005
.095 0.005 0.9798 0.0001 0.595 0.005
.105 0.005 0.9798 0.0001 0.595 0.005
.095 0.005 0.9711 0.0001 0.595 0.005
.110 0.005 0.9640 0.0001 0.595 0.005
.105 0.005 0.9704 0.0001 0.595 0.005
.090 0.005 0.9644 0.0001 0.595 0.005

Correction: no, it is not!

$ awk '!/#/ {sum+=$3}END{ave=sum/NR}END{print ave}' coriolis_data 
0.885491    // WRONG.
$ awk '{if ($0 ~ /^[[:space:]]*#/){NR--}else{sum+=$3}}END{ave=sum/NR}END{print ave}' coriolis_data
0.97404     // RIGHT.


The file that you provide for AWK to parse is not a source file, it's data, therefore, AWK knows nothing about its configuration. In other words, for AWK, lines beginning with # are nothing special.

That said, of course you can skip comments, but you will have to create a logic for that: Just tell AWK to ignore everything that comes after a "#" and count yourself the number of lines.

awk 'BEGIN {lines=0} {if(substr($1, 0, 1) != "#") {sum+=$3; lines++} } END {avg=sum/lines} END {print avg}' coriolis_data

You can, of course, indent it for better readability.


I would remove them with sed first, then remove blank lines with grep.

sed 's/#.*//' < coriolis_data | egrep -v '^$' | awk ...

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号