I'm trying to parse my maillog, which contains a number of lines which look similar to the following line:
Jun 6 17:52:06 host sendmail[30794]: p569q3sX030792: to=<person@recipient.com>, ctladdr=<apache@host.com> (48/48), delay=00:00:03, xdelay=开发者_StackOverflow00:00:03, mailer=esmtp, pri=121354, relay=gmail-smtp-in.l.google.com. [1.2.3.4], dsn=2.0.0, stat=Sent (OK 1307354043 x8si28599066ict.63)
The rules I'm trying to apply are:
- The date is always the first 2 words
- The email address always occurs between " to=person@recipient.com, " however the email address might be surrounded by <>
There are some lines in the log which do not relate to a recipient, so I'd like to ignore those lines entirely.
The following code works for either rule individually, however I'm having trouble combining them:
if($_ =~ m/\ to=([<>a-zA-Z0-9\.\@]*),\ /g) {
print "$1\n";
}
if($_ =~ /^+(\S+\s+\S+\s)/g) {
print "$1\n";
}
As always, I'm not sure whether the regex I'm using above is "best practice" so feel free to point out anything I'm doing badly there too :)
Thanks!
print substr($_, 0, 7), "$1\n" if / to=(.+?), /;
Your date is in a fixed-length format, you don't need a regular expression to match it.
For the address, what you need is the part between to=
and the next ,
, so a non-greedy match is just what you need.
To match either with one regex, or
them using syntax (regex1|regex2)
together:
((?<\ to=)[<>a-zA-Z0-9\.\@]*(?=,\ )|^\S+\s+\S+\s)
The outer brackets preserve $1
being assigned the match.
The look behind (?<\ to=)
and look ahead (?=,\ )
do not capture anything, so these regexes only capture your target string.
精彩评论