Suppose I have something like this:
<span class="filesize">File<a href="http://example.com/image.jpg"
target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually
called.jpg">what the file is actually called.jpg</span>)</span><br><a href="http://example.com
/image.jpg" target="_blank">
What I want to extract from it is http://example.com/image.jpg
and what the file is actually called.jpg
. The constant term is the <span class="filesize">File
which I can find using xpath("span[text()='File']")
but that only gives me access to the span
. Is there a way to do something like result += 1
to go to th开发者_JAVA百科e link afterward and then the span
after it with the file name?
You can use the following-sibling
and preceding-sibling
xpath "axes" to do the navigation you need.
You can get detains here.
EDIT:
Here's an example that gets me the result you want using only xpath. However it may not work for you depending on what the surrounding XML is like: (I've also had to complete some of the tags to be "real" XML. you may be able to get it working without doing that by putting your XML parser into HTML mode...)
import lxml.etree
xml = lxml.etree.XML("""<something><span class="filesize">File<a href="http://example.com/image.jpg" target="_blank">image.jpg</a>-(1.61 MB, 1000x1542, <span title="what the file is actually called.jpg">what the file is actually called.jpg</span>)</span><br/><a href="http://example.com/image.jpg" target="_blank"></a></something>""",)
print xml.xpath("a[preceding-sibling::span/text()='File']/@href")
精彩评论