I have a log file where some of the entries look like this:
YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC
and I'm trying to get it into a CSV format:
Date,Time,v1,v2,v3,v4,v5
YY/MM/DD,HH:MM:SS:MMM,XXX,YYY,ZZZ,AAA AND BBB,CCC
I'd like to do this in Perl - speaking personally, I could probably do it far quicker in other languages but I'd really like to expand my horizons a bit.
So far I can get as far as reading the file in and picking out only lines which meet my criteria but I can't seem to get the next stage done. I'll need to splice up the input line but so far I just can't开发者_JS百科 work out how to do this. I've looked at s//
and m//
but they don't really give me what I want. If anyone can advise me how this can be done or give me pointers I'd much appreciate it.
Important points:
- The values in the second part of the line are always in the same order so mapping / re-organising is not necesarily a problem.
- Some of the fields have free text which is not quoted :( but as the labels all start
v<number>=
I'm hoping parsing this should still be a possibility.
Since there is no one delimiter, you'll need to try this a few different ways:
First, split on ' '
, then take the first three values:
my @array = split / /, $line;
my ($date, $time, $constant) = splice @array, 0, 3;
Join the rest of the fields together again, and re-split on v\d+=
to get the values:
my $rest = join ' ', @array;
# $rest should now be "v1=XXX v2=YYY ..."
my @values = split /\s*v\d+=/, $rest;
shift @values; # since the first element in @values will be empty
print join ',', $date, $time, @values;
Edit: Here's another approach that may be easier to follow, and is slightly more efficient. This takes advantage of the fact that your constant text occurs between the date/time and the value list.
# assume that CONSTANT is your constant text
my ($datetime, $valuelist) = split /\s*CONSTANT\s*/, $line;
my ($date, $time) = split / /, $datetime;
my @values = split /\s*v\d+=/, $valuelist;
shift @values;
print join ',', $date, $time, @values, "\n";
What have you tried with regular expressions and how has it failed? A regex with m//
works fine for me:
#!/usr/bin/env perl
use strict;
use warnings;
print "Date,Time,v1,v2,v3,v4,v5\n";
while (my $line = <DATA>) {
my @matched = $line =~ m{^([^ ]+) ([^ ]+).*v1=(.*) v2=(.*) v3=(.*) v4=(.*) v5=(.*)};
print join(',', @matched), "\n";
}
__DATA__
YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC
Two caveats:
1) v1 cannot contain the substring " v2=", v2 cannot contain " v3=", etc., but, with such a loose format, that's something that would likely cause problems for a human attempting to parse it, too.
2) This code assumes that there will always be v1 through v5. If there are fewer than five v*n* fields, the line will fail to match. If there are more, all additional fields will be appended to v5 (including their v*n* tags).
In case the log is fixed-width, you better off using unpack
, you will see its benefits if the log grows very large (performance wise).
精彩评论