开发者

awk, filter the specific information : time, average and beta

开发者 https://www.devze.com 2023-03-09 13:52 出处:网络
i have a bunch of data collection , example : 1.00 3 4 1.00 0 1 51.00 1 4 84.00 3 4 95.00 0 2 110.00 2 4

i have a bunch of data collection , example :

 1.00 3 4
 1.00 0 1
 51.00 1 4
 84.00 3 4
 95.00 0 2
 110.00 2 4
 120.00 0 1
 121.00 1 2
 124.00 2 4
 158.00 3 4
 159.00 1 3
 172.00 0 4
 214.00 0 4
 223.00 2 4
 224.00 1 2
 228.00 1 4
 229.00 0 1
 232.00 2 3
 233.00 3 4
 233.00 1 3
 246.00 0 2
 292.00 0 3
 294.00 0 4
 294.00 2 4
 294.00 3 4
 318.00 1 2
 331.00 0 1
 383.00 2 4
 402.00 3 4

then the output that i want to generate is like this :

node_src node_dst time_repeated time1 time2 ... average_time ß 

detail :

*node_src = 2nd column 
*node_dst = 3rd column 
*time_repeated = the number of the same line is repeated, example 3 4 is repeated 5 time
*time1, time2 .. = are the value of column 1
*average_time = the average time for the different interval, 
 example see below,
*ß = time_repeated / average_time

my attempt generated this result :

node1  node2  nbrepeated    time1  time2  time3  time4  time5  time6  time7  average ß  
2       4       6           110.0  124.0  223.0  294.0  383.0  461.0  543.0  6.0     0           开发者_JS百科
2       3       1           232.0  402.0  0.0    0.0    0.0    0.0    0.0    1.0     0     
1       3       2           159.0  233.0  521.0  0.0    0.0    0.0    0.0    2.0     4      
1       2       4           121.0  224.0  318.0  461.0  573.0  0.0    0.0    4.0     5     
0       4       4           172.0  214.0  294.0  415.0  543.0  0.0    0.0    4.0     5      
0       2       5           95.0   246.0  415.0  536.0  572.0  588.0  0.0    5.0     :      
0       3       3           292.0  403.0  455.0  588.0  0.0    0.0    0.0    3.0     :      
1       4       2           51.0   228.0  494.0  0.0    0.0    0.0    0.0    2.0     :      
0       1       4           1.0    120.0  229.0  331.0  536.0  0.0    0.0    4.0     :      
3       4       6           1.0    84.0   158.0  233.0  294.0  402.0  431.0  6.0     :

i was unable to fine the average time and ß due to the complexity of the calculation to find the average time is like this :

    121.0  224.0  318.0  461.0  573.0 

    avg_time = ((224-121)+(318-224)+(461-318)+(573-461))/4

the challenge here, is to make it dynamically, since the number time field is unknown... made using bash...

here is the code, thanks to glenn jackman

 #!/bin/bash


declare -A t

while read tm f1 f2; do
    t["$f1:$f2"]+=" $tm"
done < $1

max=0
for key in "${!t[@]}"; do
    set -- ${t[$key]}
    [[ $# -gt $max ]] && max=$#
done

{
    printf "field1 field2 nbrepeated"
    for i in $(seq $max); do printf " %s" time$i; done
    echo " average_time beta"


    for key in "${!t[@]}"; do
        f1=${key%:*}
        f2=${key#*:}
        set -- ${t[$key]}
        f3=$(($# - 1))
        f4=$(($# - 1))
    f5= 1 
        printf "%d %d %d" $f1 $f2 $f3 
        for i in $(seq $max); do
            printf " %.1f" ${1-0}  
            shift
            done
        printf " %.1f %.1f" $f4 $f5


        echo ""

    done
} | column -t

modification need to do :

  1. find the average time : avg_time
  2. find the beta

p/s : normally to find the average time, people do : sum/NR, but it was not the case for my question

case solve : here is the output

field1  field2  nbrepeated  time1  time2  time3  time4  time5  time6  time7  average_time  beta
2       4       6           110.0  124.0  223.0  294.0  383.0  461.0  543.0  72.16         0.08
2       3       1           232.0  402.0  0.0    0.0    0.0    0.0    0.0    170.00        0.00
1       3       2           159.0  233.0  521.0  0.0    0.0    0.0    0.0    181.00        0.01
1       2       4           121.0  224.0  318.0  461.0  573.0  0.0    0.0    113.00        0.03


First, note that the average formula can be simplified. For example:

121.0  224.0  318.0  461.0  573.0 
= (573.0 - 121.0)/4

I have added the following section to calculate the average and beta:

avg=0
beta=0
if [ $f3 -ne 0 ]
then
   total=$(bc<<<${@: -1}-$1)
   avg=$(bc<<<"scale=2;$total/$f3")
   beta=$(bc<<<"scale=2;$f3/$avg")
fi 

The complete script becomes:

declare -A t

while read tm f1 f2; do
    t["$f1:$f2"]+=" $tm"
done < f.txt

max=0
for key in "${!t[@]}"; do
    set -- ${t[$key]}
    [[ $# -gt $max ]] && max=$#
done

{
    printf "field1 field2 nbrepeated"
    for i in $(seq $max); do printf " %s" time$i; done
    echo " average_time beta"


    for key in "${!t[@]}"; do
        f1=${key%:*}
        f2=${key#*:}
        set -- ${t[$key]}
    f3=$(($# - 1))

    avg=0
    beta=0
        # don't want to divide by zero if we have only one time 
    if [ $f3 -ne 0 ]
    then
       total=$(bc<<<${@: -1}-$1)
       avg=$(bc<<<"scale=2;$total/$f3")
       beta=$(bc<<<"scale=2;$f3/$avg")
    fi 

        printf "%d %d %d" $f1 $f2 $f3 
        for i in $(seq $max); do
            printf " %.1f" ${1-0}  
            shift
        done

    printf " %.2f %.2f" $avg $beta


        echo ""

    done
} | column -t

Output

field1  field2  nbrepeated  time1  time2  time3  time4  time5  time6  average_time  beta
2       4       4           110.0  124.0  223.0  294.0  383.0  0.0    68.25         0.05
2       3       0           232.0  0.0    0.0    0.0    0.0    0.0    0.00          0.00
1       3       1           159.0  233.0  0.0    0.0    0.0    0.0    74.00         0.01
1       2       2           121.0  224.0  318.0  0.0    0.0    0.0    98.50         0.02
0       4       2           172.0  214.0  294.0  0.0    0.0    0.0    61.00         0.03
0       2       1           95.0   246.0  0.0    0.0    0.0    0.0    151.00        0.00
0       3       0           292.0  0.0    0.0    0.0    0.0    0.0    0.00          0.00
1       4       1           51.0   228.0  0.0    0.0    0.0    0.0    177.00        0.00
0       1       3           1.0    120.0  229.0  331.0  0.0    0.0    110.00        0.02
3       4       5           1.0    84.0   158.0  233.0  294.0  402.0  80.20         0.06
0

精彩评论

暂无评论...
验证码 换一张
取 消