开发者

How to extract text from a web page that requires logging in using python and beautiful soup?

开发者 https://www.devze.com 2023-03-09 20:27 出处:网络
i have to retrieve some text from a website called morningstar.com . To access that data i have to log in. Once i log in and provide the urlof the web page,i get the HTML text of a normal user (not lo

i have to retrieve some text from a website called morningstar.com . To access that data i have to log in. Once i log in and provide the url of the web page , i get the HTML text of a normal user (not logged in).As a resul开发者_JS百科t am not able to accees that information . ANy solutions ?


BeautifulSoup is for parsing html once you've already fetched it. You can fetch the html using any standard url fetching library. I prefer curl, as you tagged your post, python's built-in urllib2 also works well.

If you're saying that after logging in the response html is the same as for those who are not logged in, I'm gonna guess that your login is failing for some reason. If you are using urllib2, are are you making sure to store the cookie properly after your first login and then passing this cookie to urllib2 when you are sending the request for the data?

It would help if you posted the code you are using to make the two requests (the initial login, and the attempt to fetch the data).

0

精彩评论

暂无评论...
验证码 换一张
取 消