开发者

Python + XPath: Is it possible to select the next element after the one I actually want?

开发者 https://www.devze.com 2023-04-05 13:44 出处:网络
Suppose I have something like this: <span class=\"filesize\">File<a href=\"http://example.com/image.jpg\"

Suppose I have something like this:

<span class="filesize">File<a href="http://example.com/image.jpg" 
target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually 
called.jpg">what the file is actually called.jpg</span>)</span><br><a href="http://example.com
/image.jpg" target="_blank">

What I want to extract from it is http://example.com/image.jpg and what the file is actually called.jpg. The constant term is the <span class="filesize">File which I can find using xpath("span[text()='File']") but that only gives me access to the span. Is there a way to do something like result += 1 to go to th开发者_JAVA百科e link afterward and then the span after it with the file name?


You can use the following-sibling and preceding-sibling xpath "axes" to do the navigation you need. You can get detains here.

EDIT:

Here's an example that gets me the result you want using only xpath. However it may not work for you depending on what the surrounding XML is like: (I've also had to complete some of the tags to be "real" XML. you may be able to get it working without doing that by putting your XML parser into HTML mode...)

import lxml.etree

xml = lxml.etree.XML("""<something><span class="filesize">File<a href="http://example.com/image.jpg" target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually called.jpg">what the file is actually called.jpg</span>)</span><br/><a href="http://example.com/image.jpg" target="_blank"></a></something>""",)

print xml.xpath("a[preceding-sibling::span/text()='File']/@href")
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号