开发者

Submitting queries to, and scraping results from aspx pages using python?

开发者 https://www.devze.com 2022-12-16 20:33 出处:网络
I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx

I am trying to get results for a batch of queries to this demographics tools page: http://adlab.microsoft.com/Demographics-Prediction/DPUI.aspx

The POST action on the form calls the same page (_self) and is probably posting some event data. I read on another post here at stackoverflow that aspx pages typically need some viewstate and validation data. Do I simply save these开发者_开发问答 from a request, re-send in a POST request?

Or is there a cleaner way to do this? One of those aspx viewstate parameters is about a 1000 characters and the incredible ugliness of pasting that into my code makes me think there HAS to be a better way. Any and all references to stuff I can read up will be helpful, thanks!


Perhaps mechanize may be of use.


Use urllib2. Your POST data is a simple Python dictionary. Very easy to edit and maintain.

If your form contains hidden fields -- some of which are encoded -- then you need to do a GET to get the form and the various hidden field seed values.

Once you GET the form, you can add the necessary input values to the given, hidden values and POST the response back again.

Also, you'll have to be sure that you handle any cookies. urllib2 will help with that, also.

After all, that's all a browser does, and it works in a browser. Browser's don't know ASPX from CGI from WSGI, so there's no magic because it's ASPX. You sometimes have to do a GET before a POST to get values and cookies set up properly.


I've used a combination requests and BeautifulSoup4 for a similar task.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号