开发者

Perl Mechanize find all links within Div

开发者 https://www.devze.com 2023-03-14 15:43 出处:网络
is there a way to find all links within a specific div by using Mechanize? 开发者_开发问答I tried to use find_all_links but couldn\'t find a way to get through this.

is there a way to find all links within a specific div by using Mechanize?

开发者_开发问答

I tried to use find_all_links but couldn't find a way to get through this. for example,

<div class="sometag">
<ul class"tags">
<li><a href="/a.html">A</a></li>
<li><a href="/b.html">B</a></li> 
</ul>
</div>


A useful tool for grabbing useful info out of HTML files is HTML::Grabber. It uses a jQuery style of syntax to reference elements in the HTML, so you might do something like this:

use HTML::Grabber;

# Your mechanize stuff here ...

my $dom = HTML::Grabber->new( html => $mech->content );

my @links;
$dom->find('div.sometag a')->each(sub {
    push @links, $_->attr('href');
});


Web::Scraper is useful for scraping.

use strict;
use warnings;
use WWW::Mechanize;
use Web::Scraper;

my $mech = WWW::Mechanize->new;
$mech->env_proxy;
# If you want to login, do it with mechanize.

my $staff = scrape { process 'div.sometag li.tags a', 'links[]' => '@href' };
# pass mechanize to scraper as useragent.
$staff->user_agent($mech);

my $res = $staff->scrape( URI->new("http://example.com/") );
for my $link (@{$res->{links}}) {
    warn $link;
}

Sorry, I didn't test this code.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号