开发者

what is the simplest way to get element content by XPath with Python?

开发者 https://www.devze.com 2023-02-01 05:33 出处:网络
I need to get content for this XPath: /html/body/div/table[2]/tbody/tr/td[2] It\'s copied from FireBug. How can I do this? I have a very large HTML document, so I don\'t want (an开发者_Python百科d

I need to get content for this XPath:

/html/body/div/table[2]/tbody/tr/td[2]

It's copied from FireBug. How can I do this? I have a very large HTML document, so I don't want (an开发者_Python百科d don't know how:) ) to grep it. Thanks.


lxml can handle html (and provides pretty good xpath support):

>>> import lxml.html
>>> tree = lxml.html.parse('test.html')
>>> for node in tree.xpath('/html/body/div/table[2]/tbody/tr/td[2]'):
...     print node.text
...          
first row, second column
second row, second column

Just make sure that you use it's html parser.


import lxml.html as h
tree = h.parse("keys_results.html")
text = tree.xpath("string(//*[contains(text(),'needed_text')])")
print text
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号