I'm working on a scraper using xpath, but xpath seems inexplicably incapable of retreiving the informati开发者_StackOverflowon that I need. I've been able to get the below code to print out the table element and all of its contents, but as soon as I try to go to the tbody or tr elements, it starts returning None. You can see the url below as well.
I've used XPather in Firefox to confirm that the below is correct, but for some reason the path fails once put into Python.
url = 'http://www.arkleg.state.ar.us/assembly/2011/2011R/pages/CommitteeDetail.aspx?committeecode=000'
with self.urlopen(url) as page:
page = lxml.html.fromstring(page)
for tr in page.xpath('//table[@class="gridtable"]/tbody/tr'):
print tr.xpath('string(td[1])')
Firefox adds the implicit tbody
inside the table
element, but this doesn't exist in the source HTML for that page. This XPATH should work to find all the tr
tags:
for node in page.xpath('.//table[@class="gridtable"]/tr'):
精彩评论