Elegantly Parsing Rigid Data in Perl_问答_开发者

开发者 https://www.devze.com 2023-01-29 19:52 出处：网络

I\'m working with a large dataset that basically boils down to something like this: my $input = q( <foo>111</foo>

I'm working with a large dataset that basically boils down to something like this:

my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR

Based on the example above, let's assume that the followi开发者_JAVA技巧ng constraints are in effect:

There will always be 5 lines of data.
Data in each line is enclosed in a single tag such as <foo>...</foo>.
Data will contain no nested tags.
All lines use the same tag (e.g foo) to enclose their data.

Ultimately, taking the above as the data source, I'd like to end up with something akin to this:

my %values = (
  one   => '111',
  two   => '222',
  three => '333',
  four  => '',
  five  => '555'
);

This is my attempt:

my @vals = $input =~ m!<foo>(.*?)</foo>!ig;

if (scalar @vals != 5) {
  # panic
}

my %values = (
  one   => shift @vals,
  two   => shift @vals,
  three => shift @vals,
  four  => shift @vals,
  five  => shift @vals
);

This works as I want, however it looks ugly and is not very flexible. Unfortunately, this is the best I can do for now since I'm new to Perl.

So, given the above constraints, what's a more elegant way to do this?

Merging two arrays into a hash:

my @keys = qw/one two three/;
my @values = qw/alpha beta gamma/;

my %hash; 
@hash{@keys} = @values;

First, take another look at:

my %values = (
  one   => '111',
  two   => '222',
  three => '333',
  four  => '',
  five  => '555'
);

This data structure associates an integer with a piece of data. But there is already a built in data structure that serves the same purpose: Arrays.

So, use arrays. Instead of writing $values{ one }, you would write $values[ 0 ], and the mapping between integers and data values would be transparent.

If the keys are something other than integers, you can do:

use strict; use warnings;

my @keys = qw(a b c d e);

my $input = q(
<foo>111</foo>
<foo>222</foo>
<foo>333</foo>
<foo></foo>
<foo>555</foo>
); # new-lines are either CR+LF, LF, or CR

my %values;

# hash slice
@values{ @keys } = $input =~ m{ <foo> (.*?) </foo>}gix;

use YAML;
print Dump \%values;

Output:

---
a: 111
b: 222
c: 333
d: ''
e: 555

Oh, something like this give or take?

use Number::Spell;
$input =~ s|<(?:/)?foo>||g;
my @lines = grep { $_ } split "\n", $input; # grep for blank lines
my $i = 0;
my %hash = map { spell_number($i++) => $_ } @lines;

Hmm, I can make this better.

use Number::Spell;
my $i = 0;
my %hash = map { s|<(?:/)?foo>||g; $_ ? spell_number($i++) => $_ : () } 
           split "\n", $input;

ed. whoops, had an @lines instead of $input inna second snippet. use caution; I have only typed out this code; I have not written a unit test.

Elegantly Parsing Rigid Data in Perl

精彩评论

关注公众号

热门标签

图文推荐

Elegantly Parsing Rigid Data in Perl

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：