开发者

Selenium with Python, how do I get the page output after running a script?

开发者 https://www.devze.com 2023-01-13 07:11 出处:网络
I\'m not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts thr

I'm not sure how to find this information, I have found a few tutorials so far about using Python with selenium but none have so much as touched on this.. I am able to run some basic test scripts through python that automate selenium but it just shows the browser window for a few seconds and then closes it.. I need to get the b开发者_C百科rowser output into a string / variable (ideally) or at least save it to a file so that python can do other things on it (parse it, etc).. I would appreciate if anyone can point me towards resources on how to do this. Thanks


using Selenium Webdriver and Python, you would simply access the .page_source property to get the source of the current page.

for example, using Firefox() driver:

from selenium import webdriver


driver = webdriver.Firefox()
driver.get('http://www.example.com/')

print(driver.page_source)

driver.quit()


There's a Selenium.getHtmlSource() method in Java, most likely it is also available in Python. It returns the source of the current page as string, so you can do whatever you want with it


Ok, so here is how I ended up doing this, for anyone who needs this in the future..

You have to use firefox for this to work.

1) create a new firefox profile (not necessary but ideal so as to separate this from normal firefox usage), there is plenty of info on how to do this on google, it depends on your OS how you do this

2) get the firefox plugin: https://addons.mozilla.org/en-US/firefox/addon/2704/ (this automatically saves all pages for a given domain name), you need to configure this to save whichever domains you intend on auto-saving.

3) then just start the selenium server to use the profile you created (below is an example for linux)

cd /root/Downloads/selenium-remote-control-1.0.3/selenium-server-1.0.3 
java -jar selenium-server.jar -firefoxProfileTemplate /path_to_your_firefox_profile/

Thats it, it will now save all the pages for a given domain name whenever selenium visits them, selenium does create a bunch of garbage pages too so you could just delete these via a simple regex parsing and its up to you, from there how to manipulate the saved pages

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号