开发者

Perl parse links from HTML Table

开发者 https://www.devze.com 2023-03-16 11:31 出处:网络
I\'m trying to get links from table in HTML. By using HTML::TableExtract, I\'m able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link that involves in the table

I'm trying to get links from table in HTML. By using HTML::TableExtract, I'm able to parse table and get text (i.e. Ability, Abnormal in below example) but cannot get link that involves in the table. For example,

<table id="AlphabetTable">
   <tr>     
   <td>
    <a href="/cate/A/Ability">Ability</a> <span class="count">2650</span>
   </td>  
   <td>
    <a href="/cate/A/Abnormal">Abnormal</a> <span class="count">26</span>
   </td>
</table>

Is there a way to get link using HTML::TableExtract ? or other module that could possibly use in this situation. Thanks

part of my code:

$mech->get($link->url());
$te->parse($mech->content);

fore开发者_运维百科ach $ts ($te->tables){
   foreach $row ($ts->rows){
       print @$row[0];     #it only prints text part
                           #but I want its link 
   }
}


HTML::LinkExtor, passing the extracted table text to its parse method.

my $le = HTML::LinkExtor->new();

foreach $ts ($te->tables){
    foreach $row ($ts->rows){
        $le->parse($row->[0]);
        for my $link_tag ( $le->links ) {
            my ($tag, %links) = @$link_tag;
            # next if $tag ne 'a'; # exclude other kinds of links?
            print for values %links;
        }
    }
}


Use keep_html option in the constructor.

keep_html

Return the raw HTML contained in the cell, rather than just the visible text. Embedded tables are not retained in the HTML extracted from a cell. Patterns for header matches must take into account HTML in the string if this option is enabled. This option has no effect if extracting into an element tree structure.

$te = HTML::TableExtract->new( keep_html => 1, headers => [qw(field1 ... fieldN)]);
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号