i have a bunch of data collection , example :
1.00 3 4
1.00 0 1
51.00 1 4
84.00 3 4
95.00 0 2
110.00 2 4
120.00 0 1
121.00 1 2
124.00 2 4
158.00 3 4
159.00 1 3
172.00 0 4
214.00 0 4
223.00 2 4
224.00 1 2
228.00 1 4
229.00 0 1
232.00 2 3
233.00 3 4
233.00 1 3
246.00 0 2
292.00 0 3
294.00 0 4
294.00 2 4
294.00 3 4
318.00 1 2
331.00 0 1
383.00 2 4
402.00 3 4
then the output that i want to generate is like this :
node_src node_dst time_repeated time1 time2 ... average_time ß
detail :
*node_src = 2nd column
*node_dst = 3rd column
*time_repeated = the number of the same line is repeated, example 3 4 is repeated 5 time
*time1, time2 .. = are the value of column 1
*average_time = the average time for the different interval,
example see below,
*ß = time_repeated / average_time
my attempt generated this result :
node1 node2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average ß
2 4 6 110.0 124.0 223.0 294.0 383.0 461.0 543.0 6.0 0 开发者_JS百科
2 3 1 232.0 402.0 0.0 0.0 0.0 0.0 0.0 1.0 0
1 3 2 159.0 233.0 521.0 0.0 0.0 0.0 0.0 2.0 4
1 2 4 121.0 224.0 318.0 461.0 573.0 0.0 0.0 4.0 5
0 4 4 172.0 214.0 294.0 415.0 543.0 0.0 0.0 4.0 5
0 2 5 95.0 246.0 415.0 536.0 572.0 588.0 0.0 5.0 :
0 3 3 292.0 403.0 455.0 588.0 0.0 0.0 0.0 3.0 :
1 4 2 51.0 228.0 494.0 0.0 0.0 0.0 0.0 2.0 :
0 1 4 1.0 120.0 229.0 331.0 536.0 0.0 0.0 4.0 :
3 4 6 1.0 84.0 158.0 233.0 294.0 402.0 431.0 6.0 :
i was unable to fine the average time and ß due to the complexity of the calculation to find the average time is like this :
121.0 224.0 318.0 461.0 573.0
avg_time = ((224-121)+(318-224)+(461-318)+(573-461))/4
the challenge here, is to make it dynamically, since the number time field is unknown... made using bash...
here is the code, thanks to glenn jackman
#!/bin/bash
declare -A t
while read tm f1 f2; do
t["$f1:$f2"]+=" $tm"
done < $1
max=0
for key in "${!t[@]}"; do
set -- ${t[$key]}
[[ $# -gt $max ]] && max=$#
done
{
printf "field1 field2 nbrepeated"
for i in $(seq $max); do printf " %s" time$i; done
echo " average_time beta"
for key in "${!t[@]}"; do
f1=${key%:*}
f2=${key#*:}
set -- ${t[$key]}
f3=$(($# - 1))
f4=$(($# - 1))
f5= 1
printf "%d %d %d" $f1 $f2 $f3
for i in $(seq $max); do
printf " %.1f" ${1-0}
shift
done
printf " %.1f %.1f" $f4 $f5
echo ""
done
} | column -t
modification need to do :
- find the average time : avg_time
- find the beta
p/s : normally to find the average time, people do : sum/NR
, but it was not the case for my question
case solve : here is the output
field1 field2 nbrepeated time1 time2 time3 time4 time5 time6 time7 average_time beta
2 4 6 110.0 124.0 223.0 294.0 383.0 461.0 543.0 72.16 0.08
2 3 1 232.0 402.0 0.0 0.0 0.0 0.0 0.0 170.00 0.00
1 3 2 159.0 233.0 521.0 0.0 0.0 0.0 0.0 181.00 0.01
1 2 4 121.0 224.0 318.0 461.0 573.0 0.0 0.0 113.00 0.03
First, note that the average formula can be simplified. For example:
121.0 224.0 318.0 461.0 573.0
= (573.0 - 121.0)/4
I have added the following section to calculate the average and beta:
avg=0
beta=0
if [ $f3 -ne 0 ]
then
total=$(bc<<<${@: -1}-$1)
avg=$(bc<<<"scale=2;$total/$f3")
beta=$(bc<<<"scale=2;$f3/$avg")
fi
The complete script becomes:
declare -A t
while read tm f1 f2; do
t["$f1:$f2"]+=" $tm"
done < f.txt
max=0
for key in "${!t[@]}"; do
set -- ${t[$key]}
[[ $# -gt $max ]] && max=$#
done
{
printf "field1 field2 nbrepeated"
for i in $(seq $max); do printf " %s" time$i; done
echo " average_time beta"
for key in "${!t[@]}"; do
f1=${key%:*}
f2=${key#*:}
set -- ${t[$key]}
f3=$(($# - 1))
avg=0
beta=0
# don't want to divide by zero if we have only one time
if [ $f3 -ne 0 ]
then
total=$(bc<<<${@: -1}-$1)
avg=$(bc<<<"scale=2;$total/$f3")
beta=$(bc<<<"scale=2;$f3/$avg")
fi
printf "%d %d %d" $f1 $f2 $f3
for i in $(seq $max); do
printf " %.1f" ${1-0}
shift
done
printf " %.2f %.2f" $avg $beta
echo ""
done
} | column -t
Output
field1 field2 nbrepeated time1 time2 time3 time4 time5 time6 average_time beta
2 4 4 110.0 124.0 223.0 294.0 383.0 0.0 68.25 0.05
2 3 0 232.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00
1 3 1 159.0 233.0 0.0 0.0 0.0 0.0 74.00 0.01
1 2 2 121.0 224.0 318.0 0.0 0.0 0.0 98.50 0.02
0 4 2 172.0 214.0 294.0 0.0 0.0 0.0 61.00 0.03
0 2 1 95.0 246.0 0.0 0.0 0.0 0.0 151.00 0.00
0 3 0 292.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00
1 4 1 51.0 228.0 0.0 0.0 0.0 0.0 177.00 0.00
0 1 3 1.0 120.0 229.0 331.0 0.0 0.0 110.00 0.02
3 4 5 1.0 84.0 158.0 233.0 294.0 402.0 80.20 0.06
精彩评论