开发者

Elegantly Parsing Rigid Data in Perl

开发者 https://www.devze.com 2023-01-29 19:52 出处:网络
I\'m working with a large dataset that basically boils down to something like this: my $input = q( <foo>111</foo>

I'm working with a large dataset that basically boils down to something like this:

my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR

Based on the example above, let's assume that the followi开发者_JAVA技巧ng constraints are in effect:

  • There will always be 5 lines of data.
  • Data in each line is enclosed in a single tag such as <foo>...</foo>.
  • Data will contain no nested tags.
  • All lines use the same tag (e.g foo) to enclose their data.

Ultimately, taking the above as the data source, I'd like to end up with something akin to this:

my %values = (
  one   => '111',
  two   => '222',
  three => '333',
  four  => '',
  five  => '555'
);

This is my attempt:

my @vals = $input =~ m!<foo>(.*?)</foo>!ig;

if (scalar @vals != 5) {
  # panic
}

my %values = (
  one   => shift @vals,
  two   => shift @vals,
  three => shift @vals,
  four  => shift @vals,
  five  => shift @vals
);

This works as I want, however it looks ugly and is not very flexible. Unfortunately, this is the best I can do for now since I'm new to Perl.

So, given the above constraints, what's a more elegant way to do this?


Merging two arrays into a hash:

my @keys = qw/one two three/;
my @values = qw/alpha beta gamma/;

my %hash; 
@hash{@keys} = @values;


First, take another look at:

my %values = (
  one   => '111',
  two   => '222',
  three => '333',
  four  => '',
  five  => '555'
);

This data structure associates an integer with a piece of data. But there is already a built in data structure that serves the same purpose: Arrays.

So, use arrays. Instead of writing $values{ one }, you would write $values[ 0 ], and the mapping between integers and data values would be transparent.

If the keys are something other than integers, you can do:

use strict; use warnings;

my @keys = qw(a b c d e);

my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR

my %values;

# hash slice
@values{ @keys } = $input =~ m{ <foo> (.*?) </foo>}gix;

use YAML;
print Dump \%values;

Output:

---
a: 111
b: 222
c: 333
d: ''
e: 555


Oh, something like this give or take?

use Number::Spell;
$input =~ s|<(?:/)?foo>||g;
my @lines = grep { $_ } split "\n", $input; # grep for blank lines
my $i = 0;
my %hash = map { spell_number($i++) => $_ } @lines;

Hmm, I can make this better.

use Number::Spell;
my $i = 0;
my %hash = map { s|<(?:/)?foo>||g; $_ ? spell_number($i++) => $_ : () } 
           split "\n", $input;

ed. whoops, had an @lines instead of $input inna second snippet. use caution; I have only typed out this code; I have not written a unit test.

0

精彩评论

暂无评论...
验证码 换一张
取 消