开发者

BeautifulSoup get innerhtml data

开发者 https://www.devze.com 2023-01-06 14:59 出处:网络
I am trying to read data from a website. I can see the value I need but the value does not appear in the downloaded html code (using urllib2). The value is created by some js file and embedded into th

I am trying to read data from a website. I can see the value I need but the value does not appear in the downloaded html code (using urllib2). The value is created by some js file and embedded into the webpage as innerhtml for that id. PS: How 开发者_JAVA技巧can that be extracted? raw source code cannot render js unlike the browsers!


Another way of getting data is leaving the browser do all the stuff using Selenium and read the rendered html. A bit slow but surely effective.

Here you can find a getting started guide for using Selenium with Python: http://jimmyg.org/blog/2009/getting-started-with-selenium-and-python.html


You have two options: Have the browser save the DOM (this includes all changes made by scripts) or use a JavaScript engine to execute the embedded scripts.

For the latter route, try a Java based engine like Rhino and emulate the browser with env.js.

0

精彩评论

暂无评论...
验证码 换一张
取 消