开发者

How can I get the final redirect URL when using urllib2.urlopen?

开发者 https://www.devze.com 2023-01-12 16:59 出处:网络
I\'m using the urllib2.urlopen method to open a URL and fetch the markup of a we开发者_Python百科bpage. Some of these sites redirect me using the 301/302 redirects. I would like to know the final URL

I'm using the urllib2.urlopen method to open a URL and fetch the markup of a we开发者_Python百科bpage. Some of these sites redirect me using the 301/302 redirects. I would like to know the final URL that I've been redirected to. How can I get this?


Call the .geturl() method of the file object returned. Per the urllib2 docs:

geturl() — return the URL of the resource retrieved, commonly used to determine if a redirect was followed

Example:

import urllib2
response = urllib2.urlopen('http://tinyurl.com/5b2su2')
response.geturl() # 'http://stackoverflow.com/'


The return value of urllib2.urlopen has a geturl() method which should return the actual (i.e. last redirect) url.


e.g.: urllib2.urlopen('ORIGINAL LINK').geturl()

urllib2.urlopen(urllib2.Request('ORIGINAL LINK')).geturl()


You can use HttpLib2 with follow_all_redirects = True and get the content-location from the response headers. See my answer to 'httplib is not getting all the redirect codes' for an example.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号