Working in lxml, I want to get the href
attribute of all links with an img
child that has title="Go to next page"
.
So in the following snippet:
<a class="noborder" href="StdResults.aspx">
<img src="arrowr.gif" title="Go to next page"></img>
</a>
I'd like to get StdResults.aspx
back.
I've got this far:
next_link = doc.xpath("//a/img[@title='Go to next page']")
print next_link[0].attrib['href']
开发者_如何学Go
But next_link
is the img
, not the a
tag - how can I get the a
tag?
Thanks.
Just change a/img...
to a[img...]
: (the brackets sort of mean "such that")
import lxml.html as lh
content='''<a class="noborder" href="StdResults.aspx">
<img src="arrowr.gif" title="Go to next page"></img>
</a>'''
doc=lh.fromstring(content)
for elt in doc.xpath("//a[img[@title='Go to next page']]"):
print(elt.attrib['href'])
# StdResults.aspx
Or, you could go even farther and use
"//a[img[@title='Go to next page']]/@href"
to retrieve the values of the href attributes.
You can also select the parent node or arbitrary ancestors by using //a/img[@title='Go to next page']/parent::a
or //a/img[@title='Go to next page']/ancestor::a
respectively as XPath expressions.
精彩评论