开发者

urllib2 redirect empty page (though code is 200 and geturl() points to new page)

开发者 https://www.devze.com 2023-01-25 10:48 出处:网络
I am trying to access a web page using urllib2 and the automatic redirect in urllib2 does not seem toretrieve the entire page.

I am trying to access a web page using urllib2 and the automatic redirect in urllib2 does not seem to retrieve the entire page. Here is my code:

request = urllib2.Request(link)
request.add_header('User-Agent','...')
opener = urllib2.build_opener()

page = opener.open(request)
print(page.code)
print(page.geturl())
print(page.read())

a) When link = 'https://www.google.com'. It prints

200
https://www.google.com
<!doctype...> Etc. Etc. </s开发者_Go百科cript>

b) When link = 'https://www.xyz.com/a_link_which_is_redirected.html'. It prints

200
https://the_new_link
<blank>

However, If I access the 'link' in b) via an internet browser, it correctly displays a page with a form.


View the source of the Google page - it really does end with a script tag. They leave off some of the closing tags because browsers can still interpret it correctly and it saves bandwidth.

Here are some test redirect pages. Which of those do not work for you?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号