开发者

What is the correct syntax to use perl array of hashes with regex? example gets hyperlinks from page

开发者 https://www.devze.com 2023-01-14 23:20 出处:网络
@urls= $field =~ /<a.*?href=\"(.*?)\".*?>.*?<\\/a>/mgo; #multi-line, global, compile-once
@urls= $field =~ /<a.*?href="(.*?)".*?>.*?<\/a>/mgo; #multi-line, global, compile-once
@text= $field =~ /<a.*?href=".*?".*?>(.*?)<\/a>/mgo;
for ($count=0; $count<(scalar @urls); $count++){
    print "\"".$text[$count]."\" goes to ->\"".$url[$count]."\"\n";}

What is the correct syntax to make this the same as the previous lines?

(@array_o开发者_StackOverflow中文版f_links->{"url"}, @array_of_links->{"text"}) = $field =~ /<a.*?href="(.*?)".*?>(.*?)<\/a>/mgo;
for ($count=0; $count<(scalar @array_of_links); $count++){
    print "\"".$array_of_links[$count]{"text"}."\" goes to ->\"".$array_of_links[$count]{"text"}."\"\n";}


I think you want is:

#!/usr/bin/perl

use strict;
use warnings;

my @array_of_links;

my $field = <<EOS;
<a href="foo.html">foo</a>
<a href="bar.html">bar</a>
<a href="baz.html">baz</a>
EOS

#/ this comment is to unconfuse the SO syntax highlighter. 

while ($field =~ m{<a.*?href="(.*?)".*?>(.*?)</a>}g) {
    push @array_of_links, { url => $1, text => $2 };
}

for my $link (@array_of_links) {
    print qq("$link->{text}" goes to -> "$link->{url}"\n);
}

The /o regex modifier does nothing if no strings are interpolated into it (and it probably shouldn't even be used then because of its surprising behavior). The /m regex modifier does nothing because you don't have the ^ or $ anchors in your regex.

You can't create an array of hashes that way. You may want to reread perldoc perldsc.

C-Style for loops are generally not required in Perl 5. The iterating for loop is much better. If you need to know the index into an array, you should use the range operator:

for my $i (0 .. $#array_of_links) {
    print qq($i. "$array_of_links[$i]{text}" goes to -> "$array_of_links[$i]{url}"\n);
}

Perl 5 allows you to choose your own delimiters for strings and regexes if you use their general forms (e.g. m// for regexes and qq// for double quotes). You can use this to avoid having to use ugly escapes that make your strings and regexes hard to read.

However, it looks like you are trying to use a regex to parse HTML. This is a path that is filled with pain. You should really be looking into how to use a parser instead.


I'll begin with the standard disclaimer that parsing HTML with regular expressions is a bad idea.

Evaluate the regular-expression match in scalar context:

In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match.

Then for each match, push a new hashref onto @array_of_links:

my @array_of_links;
push @array_of_links => { url => $1, text => $2 }
  while $field =~ /<a.*?href="(.*?)".*?>(.*?)<\/a>/mgo;

Your code for dumping the contents isn't terribly idiomatic. A Perl-style for loop is clearer and less cluttered syntactically:

for (@array_of_links) {
  print qq["$_->{text}" goes to ->"$_->{url}"\n];
}
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号