I want to find the average rainfall of any three states say CA, TX and AX for a particular month from Jan to Dec . Given input file delimited by TAB SPACES
and has the format
city name, the state , and then average rainfall amounts from January through December, and then an annual average for all months
. EG may look like
AVOCA PA 30 2.10 2.15 2.55 2.97 3.65 3.98 3.79 3.32 3.31 2.79 3.06 2.51 36.18
BAKERSFIELD CA 30 0.86 1.06 1.04 0.57 0.20 0.10 0.01 0.09 0.开发者_运维问答17 0.29 0.70 0.63 5.72
What I want to do is "To get the sum of average rainfall for say a particular month feb , over say n years and then find its average for the states CA, TX and AX.
I have written the below script in awk to do the same , but it doesn't give me the expected output
/^CA$/ {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only
/^TX$/ {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only
/^AX$/ {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only
END {
CA_avg = CA_SUM/CA;
TX_avg = TX_SUM/TX;
AX_avg = AX_SUM/AX;
printf("CA Rainfall: %5.2f",CA_avg);
printf("CA Rainfall: %5.2f",TX_avg);
printf("CA Rainfall: %5.2f",AX_avg);
}
I invoke the program with the command
awk 'FS="\t"'-f awk1.awk rainfall.txt
and see no output.
Question: Where am I slipping? Any suggestions and a changed code will be appreciated
The pattern /^CA$/
means the characters "C" and "A" are the only characters on the line. You want:
$2 == "CA" {CA++; CA_SUM+= $5}
# etc.
However, this is DRYer:
{ count[$2]++; sum[$2] += $5 }
END {
for (state in count) {
printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
}
}
Also, this looks wrong: awk 'FS="\t"'-f awk1.awk rainfall.txt
try: awk -F '\t' -f awk1.awk rainfall.txt
Response to comments:
awk -F '\t' -v month=2 -v states="CA,AZ,TX" '
BEGIN {
month_col = month + 3 # assume January is month 1
split(states, wanted_states, /,/)
}
{ count[$2]++; sum[$2] += $month_col }
END {
for (state in wanted_states) {
if (state in count) {
printf("%s Rainfall: %5.2f\n", state, sum[state]/count[state])
else
print state " Rainfall: no data"
}
}
' rainfall.txt
your regexp should be
/ CA / {CA++; cA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only
/^AX$/ match only if it is the only word in the line
HTH!
EDIT
/ CA / {CA++; CA_SUM+= $5} # ^CA$ - Regular Expression to match the word CA only
/ TX / {TX++; TX_SUM+= $5} # ^TX$ - Regular Expression to match the word TX only
/ AX / {AX++; AX_SUM+= $5} # ^AX$ - Regular Expression to match the word AX only
END {
if(CA!=0){CA_avg = CA_SUM/CA; printf("CA Rainfall: %5.2f",CA_avg);}
if(TX!=0){TX_avg = TX_SUM/TX; printf("TX Rainfall: %5.2f",TX_avg);}
if(AX!=0){TX_avg = AX_SUM/CA; printf("AX Rainfall: %5.2f",AX_avg);}
}
精彩评论