Awk command for combining lines and summarizing them_问答_开发者

Awk command for combining lines and summarizing them

开发者 https://www.devze.com 2023-02-22 18:05 出处：网络

This is the format that I have. Source IPDestination IPReceived Sent 192.168.0.110.10.10.13412341 192.168.0.110.10.10.134143

相关专题：scripting

This is the format that I have.

Source IP       Destination IP    Received Sent
192.168.0.1     10.10.10.1        3412     341
192.168.0.1     10.10.10.1        341      43
192.168.0.1     10.22.22.2        34       334
192.168.0.1     192.168.9.3       34       243

开发者_如何学JAVA

But a very large file of these. I basically want to give the total bandwidth of each source IP. So I need to combine all uniq source IPs and then add the received columns of everything that is unique and then add the sent columns. The end outcome would be:

source ip - total received packets - total sent packets

It would also be nice to uniq the source and destination IP as well so I could also get

source ip - destination ip - total received packets - total sent packets

Any help would be greatly appreciated

just looking at the Source IP:

awk '
    NR == 1 {next}
    {
        recv[$1] += $3
        sent[$1] += $4
    }
    END {for (ip in recv) printf("%s - %d - %d\n", ip, recv[ip], sent[ip]}
' filename

for source/destination pairs, just a slight modification:

awk '
    NR == 1 {next}
    {
        key = $1 " - " $2
        recv[key] += $3
        sent[key] += $4
    }
    END {for (key in recv) printf("%s - %d - %d\n", key, recv[key], sent[key])}
' filename

Ruby(1.9+)

#!/usr/bin/env ruby      
hash_recv=Hash.new(0)
hash_sent=Hash.new(0)
hash_src_dst_recv=Hash.new(0)
hash_src_dst_sent=Hash.new(0)
f=File.open("file")
f.readline
f.each do |line|
    s = line.split
    hash_recv[s[0]] += s[2].to_i
    hash_sent[s[0]] +=  s[-1].to_i
    hash_src_dst_recv[ s[0,2] ] +=  s[2].to_i
    hash_src_dst_sent[ s[0,2] ] +=  s[-1].to_i
end
f.close
p hash_recv
p hash_sent
p hash_src_dst_recv
p hash_src_dst_sent

test run:

$ ruby test.rb
{"192.168.0.1"=>3787, "192.168.168.0.1"=>34}
{"192.168.0.1"=>718, "192.168.168.0.1"=>243}
{["192.168.0.1", "10.10.10.1"]=>3753, ["192.168.0.1", "10.22.22.2"]=>34, ["192.168.168.0.1", "192.168.9.3"]=>34}
{["192.168.0.1", "10.10.10.1"]=>384, ["192.168.0.1", "10.22.22.2"]=>334, ["192.168.168.0.1", "192.168.9.3"]=>243}

I would do a (a little bit formatted but you could write it in one line):

sort file.txt | awk ' BEGIN {start = 1;} 
                           { 
                            ip = $1; 
                            if (lastip == ip) { 
                               sum_r += $3; sum_s += $4; 
                               }
                            else 
                               { if (!start) print lastip ": " sum_r ", " sum_s
                                 else 
                                    start = 0;
                                 lastip = ip; sum_r = $3; sum_s = $4;
                                }
                            }
                       END { print lastip ": " sum_r ", " sum_s }'

 awk '{
       if (NR==FNR){ 
         Recieved[$1,$2]+=$3;Sent[$1,$2]+=$4;
       }else{
           if(Recieved[$1,$2]){
             print $1" " $2" " Recieved[$1,$2]" "Sent[$1,$2];Recieved[$1,$2]=""
           }
       }
      }' InputFile.txt InputFile.txt

InputFile is read twice hence it is added two times at the end. The first occurence of inputfile (which is used in if(NR==FNR) condition) is to build the two arrays and second inputfile (used in else condition) is to print all the combinations and also setting the array value to blank so that we wont print again.

Glenn's Solution below is much superior it reads the file only once