<span class='python'>
<a>google</a>
<a>chrome</a>
</span>
I want to get chrome
and have it working like this already.
q = item.findall('.//span[@class="python"]//a')
t = q[1].text # first element = 0
I'd like to combine it into a single XPath expression and just get one item instead of a list.
I tried this but it doesn't work.t = item.findtext('.//span[@class="p开发者_开发知识库ython"]//a[2]') # first element = 1
And the actual, not simplified, HTML is like this.
<span class='python'>
<span>
<span>
<img></img>
<a>google</a>
</span>
<a>chrome</a>
</span>
</span>
I tried this but it doesn't work.
t = item.findtext('.//span[@class="python"]//a[2]')
This is a FAQ about the //
abbreviation.
.//a[2]
means: Select all a
descendents of the current node that are the second a
child of their parent. So this may select more than one element or no element -- depending on the concrete XML document.
To put it more simply, the []
operator has higher precedence than //
.
If you want just one (the second) of all nodes returned you have to use brackets to force your wanted precedence:
(.//a)[2]
This really selects the second a
descendent of the current node.
For the actual expression used in the question, change it to:
(.//span[@class="python"]//a)[2]
or change it to:
(.//span[@class="python"]//a)[2]/text()
I'm not sure what the problem is...
>>> d = """<span class='python'>
... <a>google</a>
... <a>chrome</a>
... </span>"""
>>> from lxml import etree
>>> d = etree.HTML(d)
>>> d.xpath('.//span[@class="python"]/a[2]/text()')
['chrome']
>>>
From Comments:
or the simplification of the actual HTML I posted is too simple
You are right. What is the meaning of .//span[@class="python"]//a[2]
? This will be expanded to:
self::node()
/descendant-or-self::node()
/child::span[attribute::class="python"]
/descendant-or-self::node()
/child::a[position()=2]
It will finaly select the second a
child (fn:position()
refers to the child
axe). So, nothing will be select if your document is like:
<span class='python'>
<span>
<span>
<img></img>
<a>google</a><!-- This is the first "a" child of its parent -->
</span>
<a>chrome</a><!-- This is also the first "a" child of its parent -->
</span>
</span>
If you want the second of all descendants, use:
descendant::span[@class="python"]/descendant::a[2]
精彩评论