开发者

Parsing a log file using perl

开发者 https://www.devze.com 2023-03-05 04:21 出处:网络
I have a log file where some of the entries look like this: YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC

I have a log file where some of the entries look like this:

YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC

and I'm trying to get it into a CSV format:

Date,Time,v1,v2,v3,v4,v5
YY/MM/DD,HH:MM:SS:MMM,XXX,YYY,ZZZ,AAA AND BBB,CCC

I'd like to do this in Perl - speaking personally, I could probably do it far quicker in other languages but I'd really like to expand my horizons a bit.

So far I can get as far as reading the file in and picking out only lines which meet my criteria but I can't seem to get the next stage done. I'll need to splice up the input line but so far I just can't开发者_JS百科 work out how to do this. I've looked at s//and m// but they don't really give me what I want. If anyone can advise me how this can be done or give me pointers I'd much appreciate it.

Important points:

  • The values in the second part of the line are always in the same order so mapping / re-organising is not necesarily a problem.
  • Some of the fields have free text which is not quoted :( but as the labels all start v<number>= I'm hoping parsing this should still be a possibility.


Since there is no one delimiter, you'll need to try this a few different ways:

First, split on ' ', then take the first three values:

my @array = split / /, $line;
my ($date, $time, $constant) = splice @array, 0, 3;

Join the rest of the fields together again, and re-split on v\d+= to get the values:

my $rest = join ' ', @array;

# $rest should now be "v1=XXX v2=YYY ..."
my @values = split /\s*v\d+=/, $rest;
shift @values; # since the first element in @values will be empty

print join ',', $date, $time, @values;

Edit: Here's another approach that may be easier to follow, and is slightly more efficient. This takes advantage of the fact that your constant text occurs between the date/time and the value list.

# assume that CONSTANT is your constant text
my ($datetime, $valuelist) = split /\s*CONSTANT\s*/, $line;
my ($date, $time) = split / /, $datetime;
my @values = split /\s*v\d+=/, $valuelist;
shift @values;

print join ',', $date, $time, @values, "\n";


What have you tried with regular expressions and how has it failed? A regex with m// works fine for me:

#!/usr/bin/env perl

use strict;
use warnings;

print "Date,Time,v1,v2,v3,v4,v5\n";

while (my $line = <DATA>) {
    my @matched = $line =~ m{^([^ ]+) ([^ ]+).*v1=(.*) v2=(.*) v3=(.*) v4=(.*) v5=(.*)};
    print join(',', @matched), "\n";
}

__DATA__
YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC

Two caveats:

1) v1 cannot contain the substring " v2=", v2 cannot contain " v3=", etc., but, with such a loose format, that's something that would likely cause problems for a human attempting to parse it, too.

2) This code assumes that there will always be v1 through v5. If there are fewer than five v*n* fields, the line will fail to match. If there are more, all additional fields will be appended to v5 (including their v*n* tags).


In case the log is fixed-width, you better off using unpack, you will see its benefits if the log grows very large (performance wise).

0

精彩评论

暂无评论...
验证码 换一张
取 消