im going to write log parser for exim4 MTA, and i have a couple of questions. (i know that there is an exilog program)
Question: 1. what is better w开发者_如何学Pythonay to parse a line? (its abbout 5Gb of such lines :D ) ive got this $line:
2011-12-24 12:32:12 MeSSag3-Id-Ye <hostname> (from@some.email) <to@some.email> => H=[321.123.321.123] T="Hello this is a test"
and want get all this fields into variables.
im using now something likethat ($var,[var2])=($line =~ /somecoolregexp/ )
; is it fast/good or i should use something else?
Well, it depends on what you want to do with the data.
Assuming you have a big while (<>) { ... }
around this, you can get the simplest parsing by just using split:
my @fields = split;
Next level would be to add a bit of meaning
my ($date, $time, $id, $host, $from, $to, undef, $dest) = split;
(Note, you can assign to undef
if you want to ignore a result)
Finally, you can clean up a lot of the cruft by using a regular expression. You can also combine the split above with smaller regexps to clean each field individually.
my ($datetime, $id, $host, $from, $to, $dest) =
/([\d-]+ [\d:]+) \s+ # date and time together
(\S+) \s+ # message id, just a block of non-whitespace
<(.*?)> \s+ # hostname in angle brackets, .*? is non-greedy slurp
\((.*?)\) \s+ # from email in parens
<(.*?)> \s+ # to email in angle brackets
\S+ \s+ # separated between to-email and dest
(\S+) # last bit, could be improved to (\w)=\[(.*?)\]
/x; # /x lets us break all of this up, so its a bit readable
Of course, you can keep on taking this to all sorts of silliness, but if you're going to start doing more specific parsing of these fields, I'd go with the initial split followed by broken-out field parsing. For example:
my ($date, $time, ...) = split;
my ($year, $month, $day) = split(/-/, $date);
my ($hour, $min, $sec) = split(/:/, $time);
my ($from_user, $from_host) = ( $from =~ /< ([^\@]+) \@ (.*) >/x );
...etc...
精彩评论