I'm trying to get the value of the School District listed on this website: http://gis.nyc.gov/dcp/at/f1.jsp?submit=true&house_nbr=310&street_name=Lenox+Avenue&boro=1
I used Firebug to get the XPath of that value: /html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]
And would like to read it in with Perl. I wrote the following code:
#!/usr/开发者_StackOverflow中文版bin/perl -w
use HTML::TreeBuilder::XPath;
use Data::Dumper;
my $tree= HTML::TreeBuilder::XPath->new;
$tree->parse_file("test.html");
my @nb=$tree->findvalue( '/html/body/table/tbody/tr[2]/td/table/tbody/tr/td[2]/table/tbody/tr/td/table/tbody/tr[3]/td/table/tbody/tr[2]/td[2]/table/tbody/tr[2]/td/table/tbody/tr[10]/td[2]');
print Dumper(@nb);
But it just returns $VAR1 = '';
.
Any suggestions. To get this to run, I just copied the source from the webpage into test.html.
Thank you!
The start tag of certain HTML elements (HTML, HEAD, BODY and TBODY) is optional. Take a look at
...<table><tr><td>Foo</td></tr></table>...
According to HTML, there are four elements represented by that snippet:
TABLE
TBODY
TR
TD
Firefox creates all four elements, so it gives the following xpath for the TD element:
.../table/tbody/tr/td
HTML::TreeBuilder probably doesn't create elements when their start tags have been omitted, so it only creates three elements for that snippet:
TABLE
TR
TD
You'd need to use the following xpath to locate the TD element:
.../table/tr/td
I bet you'll find results if you removed the tbody
tests from your xpath, as the TBODY elements are most likely not found in the file.
精彩评论