开发者

Python lxml XPath problem

开发者 https://www.devze.com 2023-02-17 12:47 出处:网络
I\'m trying to print/save a certai开发者_运维技巧n element\'s HTML from a web-page. I\'ve retrieved the requested element\'s XPath from firebug.

I'm trying to print/save a certai开发者_运维技巧n element's HTML from a web-page.

I've retrieved the requested element's XPath from firebug.

All I wish is to save this element to a file. I don't seem to succeed in doing so.

(tried the XPath with and without a /text() at the end)

I would appreciate any help, or past experience.

10x, David

import urllib2,StringIO
from lxml import etree

url='http://www.tutiempo.net/en/Climate/Londres_Heathrow_Airport/12-2009/37720.htm'
seite = urllib2.urlopen(url)
html = seite.read()
seite.close()
parser = etree.HTMLParser()
tree = etree.parse(StringIO.StringIO(html), parser)
xpath = "/html/body/table/tbody/tr/td[2]/div/table/tbody/tr[6]/td/table/tbody/tr/td[3]/table/tbody/tr[3]/td/table/tbody/tr/td/table/tbody/tr/td/table/tbody/text()"
elem = tree.xpath(xpath)


print elem[0].strip().encode("utf-8")


Your XPath is obviously a bit too long, why don't you try shorter ones and see if they match. One problem might be "tbody" which gets automatically created in the DOM by browsers but the HTML markup usually does not contain it.

Here's an example of how to use XPath results:

>>> from lxml import etree
>>> from StringIO import StringIO
>>> doc = etree.parse(StringIO("<html><body>a<something/>b</body></root>"), etree.HTMLParser())
>>> doc.xpath("/html/body/text()")
['a', 'b']

So you could just "".join(...) all text parts together if needed.


Not sure I completely follow what you are trying to accomplish, but ultimately I think you are looking for:

print etree.tostring(elem[0])
0

精彩评论

暂无评论...
验证码 换一张
取 消