I have a bunch of log files which are pure text. Here is an example of one...
Overall Failures Log
SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings
HW Failures - 03.09.2010 - /logs/hwfailures.txt - 42 errors - 25 warnings
SW Failures - 03.10.2010 - /logs/swfailures.txt - 32 errors - 27 warnings
HW Failures - 03.10.2010 - /logs/hwfailures.txt - 11 errors - 31 warnings
These files can get quite large and c开发者_如何学Contain a lot of other information. I'd like to produce an HTML file from this log that will add links to key portions and allow the user to open up other log files as a result...
SW Failures - 03.09.2010 - <a href="/logs/swfailures.txt">/logs/swfailures.txt</a> - 23 errors - 24 warnings
This is greatly simplified as I would like to add many more links and other html elements. My question is -- what is the best way to do this? If the files are large, should I generate the html before serving it to the user or will jsp do? Should I use perl or other scripting languages to do this? What are your thoughts and experiences?
Here is a simple example using Perl's HTML::Template:
#!/usr/bin/perl
use strict; use warnings;
use HTML::Template;
my $tmpl = HTML::Template->new(scalarref => \ <<EOTMPL
<!DOCTYPE HTML>
<html><head><title>HTMLized Log</title>
<style type="text/css">
#log li { font-family: "Courier New" }
.errors { background:yellow; color:red }
.warnings { background:#3cf; color:blue }
</style>
</head><body>
<ol id="log">
<TMPL_LOOP LOG>
<li><span class="type"><TMPL_VAR TYPE></span>
<span class="date"><TMPL_VAR DATE></span>
<a href="<TMPL_VAR FILE>"><TMPL_VAR FILE></a>
<span class="errors"><TMPL_VAR ERRORS></span>
<span class="warnings"><TMPL_VAR WARNINGS></span>
</li>
</TMPL_LOOP>
</ol></body></html>
EOTMPL
);
my @log;
my @fields = qw( TYPE DATE FILE ERRORS WARNINGS );
while ( my $entry = <DATA> ) {
chomp $entry;
last unless $entry =~ /\S/;
my %entry;
@entry{ @fields } = split / - /, $entry;
push @log, \%entry;
}
$tmpl->param(LOG => \@log);
print $tmpl->output;
__DATA__
SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings
HW Failures - 03.09.2010 - /logs/hwfailures.txt - 42 errors - 25 warnings
SW Failures - 03.10.2010 - /logs/swfailures.txt - 32 errors - 27 warnings
HW Failures - 03.10.2010 - /logs/hwfailures.txt - 11 errors - 31 warnings
I like awk because of its automatic field parsing:
/failures.txt/ {
$6="<a href=\"" $6 "\">" $6 "</a><br>"
}
{
print
}
I'd use python regular expressions.
>>> import re
>>> a = re.compile(r'[SH]W Failures - \d\d.\d\d.\d\d\d\d - (.*) - \d+ errors -
\d+ warnings')
>>> str = 'SW Failures - 03.09.2010 - /logs/swfailures.txt - 23 errors - 24 warnings'
>>> b = a.match(str)
>>> b
<_sre.SRE_Match object at 0x7ff34160>
>>> b.groups()
('/logs/swfailures.txt',)
>>> str.replace(b.group(1), '<a href="%s">%s</a>' % (b.group(1), b.group(1)))
'SW Failures - 03.09.2010 - <a href="/logs/swfailures.txt">/logs/swfailures.txt</a> - 23 errors - 24 warnings'
pygmentize
can handle some formats, although you may need to whip up a custom lexer for most cases.
精彩评论