开发者

Trying to read value at xpath

开发者 https://www.devze.com 2023-04-12 03:13 出处:网络
I\'m trying to get the value of the School District listed on this website: http://gis.nyc.gov/dcp/at/f1.jsp?submit=true&house_nbr=310&street_name=Lenox+Avenue&boro=1

I'm trying to get the value of the School District listed on this website: http://gis.nyc.gov/dcp/at/f1.jsp?submit=true&house_nbr=310&street_name=Lenox+Avenue&boro=1

I used Firebug to get the XPath of that value: /html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]

And would like to read it in with Perl. I wrote the following code:

#!/usr/开发者_StackOverflow中文版bin/perl -w

use HTML::TreeBuilder::XPath;
use Data::Dumper;

my $tree= HTML::TreeBuilder::XPath->new;

$tree->parse_file("test.html");

my @nb=$tree->findvalue( '/html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]');

print Dumper(@nb);

But it just returns $VAR1 = '';.

Any suggestions. To get this to run, I just copied the source from the webpage into test.html.

Thank you!


The start tag of certain HTML elements (HTML, HEAD, BODY and TBODY) is optional. Take a look at

...<table><tr><td>Foo</td></tr></table>...

According to HTML, there are four elements represented by that snippet:

TABLE
   TBODY
      TR
         TD

Firefox creates all four elements, so it gives the following xpath for the TD element:

.../table/tbody/tr/td

HTML::TreeBuilder probably doesn't create elements when their start tags have been omitted, so it only creates three elements for that snippet:

TABLE
   TR
      TD

You'd need to use the following xpath to locate the TD element:

.../table/tr/td

I bet you'll find results if you removed the tbody tests from your xpath, as the TBODY elements are most likely not found in the file.

0

精彩评论

暂无评论...
验证码 换一张
取 消