开发者

HTML::TableExtract not finding table

开发者 https://www.devze.com 2023-03-08 17:58 出处:网络
I\'m having trouble with some code I\'ve written. It\'s basically a proof of concept for myself and I\'ll be using it to run words through to get another form of it (fun Icelandic conjugation). In the

I'm having trouble with some code I've written. It's basically a proof of concept for myself and I'll be using it to run words through to get another form of it (fun Icelandic conjugation). In the code I've had to have an if sentence in case the URL from the word itself leads to more than one result. From there I find the relevant link, get the content from there and use TableExtract to get the table I need. Except I don't get anything useful.

#!perl



use warnings;
use HTML::TableExtract qw(tree);
use LWP::Simple;




sub saekja{
    $table = $te->first_table_found;
    $table_tree = $table->tree;
    $table_html = $table_tree->as_HTML;
};


sub leidretta{
#Ef að leitin skilar fleirri en einni niðurstöðu
    if ($content =~ /orð fundust./){

    $content =~ m/<li&开发者_Python百科gt;<strong><a href="(.*)">/;

#byrjunin á strengnum fyrir urlið
    $upphaf = "http://bin.arnastofnun.is/";
#skeytir saman strengjunum til að búa til urlið
    $urlid = $upphaf . $1;
    $content = get($urlid);
    $te  = new HTML::TableExtract( depth=>0, count=>0);



}
};
$content = get("http://bin.arnastofnun.is/leit.php?q=Fiskisl%C3%B3%C3%B0");

&leidretta;
&saekja;

I will admit that I am relatively new at this (wrote my first perl almost exactly a week ago). But I am completely stumped and copious amounts of googling haven't turned up anything useful.


This should help you go a bit forward:

#!perl

use utf8;
use warnings;
use HTML::TableExtract qw(tree);
use LWP::Simple;

$content = get("http://bin.arnastofnun.is/leit.php?q=Fiskisl%C3%B3%C3%B0");

if ($content =~ /orð fundust./) {

    $content =~ m/<li><strong><a href="(.*)">/;

    $upphaf = "http://bin.arnastofnun.is/";
    $urlid = $upphaf . $1;
    $content = get($urlid);

    $te  = new HTML::TableExtract(depth=>0, count=>0);

    $te->parse($content);   # this was missing

    $table = $te->first_table_found;
    $table_tree = $table->tree;
    $table_html = $table_tree->as_HTML;

    print $table_html,"\n";
}

You basically did not parse anything, so HTML::TableExtract did not have anything to work on. I also needed to add use utf8 to the script so it processed non-ASCII characters properly.

0

精彩评论

暂无评论...
验证码 换一张
取 消