I'm using the urllib2.urlopen
method to open a URL and fetch the markup of a we开发者_Python百科bpage. Some of these sites redirect me using the 301/302 redirects. I would like to know the final URL that I've been redirected to. How can I get this?
Call the .geturl()
method of the file object returned. Per the urllib2
docs:
geturl()
— return the URL of the resource retrieved, commonly used to determine if a redirect was followed
Example:
import urllib2
response = urllib2.urlopen('http://tinyurl.com/5b2su2')
response.geturl() # 'http://stackoverflow.com/'
The return value of urllib2.urlopen
has a geturl()
method which should return the actual (i.e. last redirect) url.
e.g.:
urllib2.urlopen('ORIGINAL LINK').geturl()
urllib2.urlopen(urllib2.Request('ORIGINAL LINK')).geturl()
You can use HttpLib2
with follow_all_redirects = True
and get the content-location
from the response headers. See my answer to 'httplib is not getting all the redirect codes' for an example.
精彩评论