开发者

Html element position in Python

开发者 https://www.devze.com 2023-01-27 23:50 出处:网络
I\'m using lxml.html for some html parsing in python.I\'d like to get a rough estimate of the location of elements within the page after it would be rendered by a browser.It does not have to be exact,

I'm using lxml.html for some html parsing in python. I'd like to get a rough estimate of the location of elements within the page after it would be rendered by a browser. It does not have to be exact, but generally correct. For simplicity I will ignore the effects of Javascript on element location. As an end r开发者_高级运维esult, I would like to be able to iterate over the elements (e.g., via lxml) and find their x/y coordinates. Any thoughts on how to do this? I don't need to stay with lxml and am happy to try other libraries.


PyQt with webkit:

import sys
from PyQt4.QtCore import *
from PyQt4.QtGui import *
from PyQt4.QtWebKit import *

class MyWebView(QWebView):
    def __init__(self):
        QWebView.__init__(self)
        QObject.connect(self,SIGNAL('loadFinished(bool)'),self.showelements)

    def showelements(self):
        html=self.page().currentFrame().documentElement()
        for link in html.findAll('a'):
            print(link.toInnerXml(),str(link.geometry())[18:])


if __name__=='__main__':
    app = QApplication(sys.argv)

    web = MyWebView()
    web.load(QUrl("http://www.google.com"))
    web.show()

    sys.exit(app.exec_())


As stated by Sven, you need an HTML rendering engine. A question on rendering HTML was asked before, you could refer to that.

Python library for rendering HTML and javascript

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号