开发者

When there is a similar pattern in an expression, how to extract the occurence of the last instance in perl?

开发者 https://www.devze.com 2023-03-05 19:34 出处:网络
The value of $s is dynamic. I need to extract the values that occur after the last | in between each [].

The value of $s is dynamic. I need to extract the values that occur after the last | in between each [].

my $s = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit开发者_运维知识库][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|Coffee]";
my @parts = split(/\]/, $s);
foreach my $part (@parts)
{
    # Need to extract the values that occur after the last '|'
    # (for example: !, .1iit, 10:48AM, Calculator, Coffee)
    # and store each of the values separately in a hash     
}

Could someone help me out in this?

Thanks,


Best to transform the string into a more useful data structure, then take the needed elements. Why is this best? Because right now you need the last element, but perhaps next time you will need some other part. Since its not harder to do it right, why not?

#!/usr/bin/perl

use strict;
use warnings;

# Only needed for Dumper
use Data::Dumper;

my $s = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|Coffee]";

# Extract each group between []
# Then transform each group into an array reference by splitting on |
my @groups = map { [ split /\|/ ] } ($s =~ /\[([^\]]*)\]/g);

# Inspect the data structure
print Dumper \@groups;

# Print only the last element of each sub-array
print "$_\n" for map {$_->[-1]} @groups;

If needed the third elements of the sub-arrays could be transformed into hashrefs quite easily too. ,however since that wasn't needed, I leave that as an exercise for the reader (I always love saying that when I get the chance!).

Edit: since I found it interesting I ended up creating these hashrefs, here is the code that would replace the my @groups line:

my @groups = map { [ map { /\{([^\}]*)\}/ ? { split /(?:=|,)/, $1 } : $_ } (split /\|/) ] } ($s =~ /\[([^\]]*)\]/g);

or more properly commented (map commands are read from the back, so the comments start at the bottom and follow by number, comments like #/N pair with those like #N)

my @groups = map { #/1
  [ #/2
    map { #/3 
      /\{([^\}]*)\}/ #4 ... and if any element (separated by pipes in #3) 
                     #      is surrounded by curly braces
        ? { #5 ... then return a hash ref
            split /(?:=|,)/, $1 #6 ... whose elements are given 
                                #      pairwise between '=' or ',' signs
          } #/5
        : $_ #7 ... otherwise (from 'if' in #4 ) return the element as is
    } (split /\|/) #3 ... where each element is separated by pipes (i.e. |)
  ] #2 ... return an array ref
} ($s =~ /\[([^\]]*)\]/g); #1 For each element between sqr braces (i.e. [])


The generic way:

@subparts = split /\|/, $part;
$tail = $subparts[$#subparts];

If you only ever need the last part separately:

$part =~ /([^\|]*)$/ and $tail = $1;


my ($value) = $part =~ m/[^|]\|(.+)$/;

print "$part => $value\n";

and another way:

my $s =
"[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|Coffee]";
my @parts = $s =~ m/\|([^|]+)]/g;

print join( "\n", @parts );


Since you insist on a regex:

@matches = $s =~ /\|([^|]+?)]/g

Using /g will dump all matches into the array @matches


You really don't need a regex... just use split(). The results are stored in %results

my $s = "[0|0|{A=145,B=2,C=12,D=18}|!][0|0|{A=167,B=2,C=67,D=17}|.1iit][196|0|{A=244,B=6,C=67,D=12}|10:48AM][204|0|{A=9,B=201,C=61,D=11}|Calculator][66|0|{A=145,B=450,C=49,D=14}|Coffee]";
foreach my $part (split(/\]/, $s))
{
    @pieces = split(/\|/, $part);
    $results{$pieces[-1]} = $pieces[-1];
}


With regexes, when you think “I want the last of,” you should immediately think of the pattern .* because regex greed does just what you want.

For example, matching /^(.*)a(.*)$/ chops up "abababab" into

  • ababab in $1
  • a matched by the literal in the pattern
  • b in $2

Let's think through the process of the match. Imagine .* as Augustus Gloop.

Augustus: Ausgezeichnet! The ^ anchor means I get to start at the beginning. From there, I shall eat all the candies!

Willie Wonka: But, my dear Augustus, you must share with the other children.

Augustus: Fine, I get "abababa" and they get "b". Happy?

Willie Wonka: But the next child in line doesn't like b candies.

Augustus: Then I shall keep "ababab" for myself and leave "ab" for the others.

At this point, Augustus has his big pile, humble little Charlie Bucket gets his single a, and Veruca Salt—although scowling about the meager quantity—gets at least something now.

In other words, $2 contains everything after the last a. To be persnickety, the ^ and $ anchors are redundant, but I like keeping them for added emphasis.

Putting this into action, you could write

#! /usr/bin/env perl

use strict;
use warnings;

sub last_fields {
  local($_) = @_;

  my @last;
  push @last, $1 =~ /^.*\|(.+)$/ ? $1 : undef
    while /\[(.*?)\]/g;

  wantarray ? @last : \@last;
}

The outer while breaks up the string into [...] chunks and assumes that right square-bracket cannot occur inside a chunk. Within each chunk, we use /^.*\|(.+)$/ to capture in $1 everything after the last pipe.

Testing it with your example looks like

my $s = "[0|0|{A=145,B=2,C=12,D=18}|!]" .
        "[0|0|{A=167,B=2,C=67,D=17}|.1iit]" .
        "[196|0|{A=244,B=6,C=67,D=12}|10:48AM]" .
        "[204|0|{A=9,B=201,C=61,D=11}|Calculator]" .
        "[66|0|{A=145,B=450,C=49,D=14}|Coffee]";

use Test::More tests => 6;
my @lasts = last_fields $s;

# yes, is_deeply could do this in a single call,
# but it's laid out explicitly here for expository benefit
is $lasts[0], "!";
is $lasts[1], ".1iit";
is $lasts[2], "10:48AM";
is $lasts[3], "Calculator";
is $lasts[4], "Coffee";
is scalar @lasts, 5;

All the tests pass:

$ ./match-last-of 
1..6
ok 1
ok 2
ok 3
ok 4
ok 5
ok 6

The output of prove is nicer. Run it yourself to see the color coding.

$ prove ./match-last-of 
./match-last-of .. ok   
All tests successful.
Files=1, Tests=6,  0 wallclock secs ( 0.02 usr  0.01 sys +  0.02 cusr  0.00 csys =  0.05 CPU)
Result: PASS
0

精彩评论

暂无评论...
验证码 换一张
取 消