开发者

Automatically saving web pages requiring login/HTTPS

开发者 https://www.devze.com 2023-02-07 01:10 出处:网络
I\'m trying to automate some datascraping from a website.However, because the user has to go through a login screen a wget cronjob won\'t work, and because I need to make an HTTPS request, a simple Pe

I'm trying to automate some datascraping from a website. However, because the user has to go through a login screen a wget cronjob won't work, and because I need to make an HTTPS request, a simple Perl script won't work either.开发者_如何学运维 I've tried looking at the "DejaClick" addon for Firefox to simply replay a series of browser events (logging into the website, navigating to where the interesting data is, downloading the page, etc.), but the addon's developers for some reason didn't include saving pages as a feature.

Is there any quick way of accomplishing what I'm trying to do here?


A while back I used mechanize wwwsearch.sourceforge.net/mechanize and found it very helpful. It supports urllib2 so it should also work with HTTPS requests as I read now. So my comment above could hopefully prove wrong.


You can record your action with IRobotSoft web scraper. See demo here: http://irobotsoft.com/help/

Then use saveFile(filename, TargetPage) function to save the target page.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号