Yesterday I asked a question here about a oneliner and mjschultz gave me an answer that I instantly fell in love with :) Awk开发者_如何学JAVA just destroyed the task at hand, parsing a large logfile (500+ MB) in a matter of seconds. Now I'm trying to port my other oneliners to awk.
This is the one in question:
grep "pop3\[" maillog | grep "User logged in" |
egrep -o '([[:digit:]]{1,3}\.){3}[[:digit:]]{1,3}' | sort -u
I need the list of all unique IP addresses using pop3 to connect to the mail server.
This is an example log entry:
Nov 15 00:49:21 hostname pop3[19418]: login: [10.10.10.10] username plaintext
User logged in
So I find all the lines containing "pop3" and I parse them for the "User logged in" part. Next i use egrep and a regex to match IP addresses and I use sort to filter out the duplicate addresses.
This is what I have so far for my awk version:
awk '/pop3\[.*.User logged in/ {ip[$7]=0} END {for (address in ip)
{ print address} }' maillog
This works perfectly but as always not all log entries are identical, for example sometimes the IP gets moved to the 8th field like here:
Nov 15 10:42:40 hostname pop3[2232]: login: hostname.domain.com [20.20.20.20]
username plaintext User logged in
What would be the best way to catch those entries with awk as well?
As always thanks for all the great responses in advance, you've taught me so much already :)
AWK code
just match your ip format ... be careful that there are no other formats ...
/pop3\[.*.User logged in/ {
where = match($0,/\[[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+/)
if (where)
ip[substr($0,RSTART+1,RLENGTH-1)]=0
}
END {for (address in ip)
{ print address} }
running at ideone
That looks more like Perl territory than Awk to me:
my %ip_addresses = ();
while (<>)
{
next unless m/pop3\[/;
next unless m/User logged in/;
if (my($ip) = $_ =~ m/( \d{1,3} (?: [.] \d{1,3} ){3} )/msx)
{
$ip_addresses{$ip} = 1;
}
}
foreach my $ip (sort keys %ip_addresses)
{
print "$ip\n";
}
The sort is less than perfect - being alphabetic rather than numeric (so 192.1.168.10 will appear before 9.25.13.26). That can be fixed, of course.
After seeing and trying these approaches I got a new idea.
belisarius's code does what I asked for but since it has to do all the regex matching it's not the fastest one and speed is what I'm after.
So I came up with this, as you can see the "problematic" log lines have an extra field, making them all 13 fields long instead of the normal 12, so I just delete the extra field, this gives me the correct list of IP addresses, next i use awk again to delete all duplicate entries:
awk '/pop3\[.*.User logged in/ {{if (NF == 13) $7="";gsub(FS "+",FS)};print $7}'
/var/log/maillog | awk '!($0 in a){a[$0];print}'
Ideone link if you want to see the code in action
精彩评论