开发者

What happens to the variable in this case when it is overwritten with self?

开发者 https://www.devze.com 2023-02-13 06:58 出处:网络
I am downloading URLs in Python and need to detect 404s, so after some search I came up with: import urllib

I am downloading URLs in Python and need to detect 404s, so after some search I came up with:

import urllib
class MyUrlOpener(urllib.FancyURLopener):
    def retrieve(self, url, filename=None, reporthook=None, data=None):
        self.file_was_found = True
        val = urllib.FancyURLopener.retrieve(self, url, filename, reporthook, data)        
        return val

    def http_error_404(url, fp, errcode, errmsg, headers, data):
        url.file_was_found = False


def download_file(url, saveas):
    urlaccess = MyUrlOpener()
    localFile, headers = urlaccess.retrieve(url, saveas)
    return urlaccess.file_was_found

My question is that if you look at the source code (Python 2.7) for FancyURLopener then you see:

def http_error(self, url, fp, errcode, errmsg, headers, 开发者_运维知识库data=None):
    """Handle http errors.
    Derived class can override this, or provide specific handlers
    named http_error_DDD where DDD is the 3-digit error code."""
    # First check if there's a specific handler for this error
    name = 'http_error_%d' % errcode
    if hasattr(self, name):
        method = getattr(self, name)
        if data is None:
            result = method(url, fp, errcode, errmsg, headers)
        else:
            result = method(url, fp, errcode, errmsg, headers, data)
        if result: return result
    return self.http_error_default(url, fp, errcode, errmsg, headers)

Which is passing the url as the first parameter and not self. I thought that the first parameter to a function was always a reference to the class instance (by convention) and my code confirms this. So what happens to the url value?

UPDATE: It turns out that data==None so it was calling the first signature. This foiled my attempts to manually add the self parameter. As soon as I added the =None default to data in my http_error_404 signature all was well (because it used the default).

The fixed / correct signature is def http_error_404(self, url, fp, errcode, errmsg, headers, data=None):


In Python, any class instance's method has self passed in by the Python interpreter and all of the other arguments are shifted down one place automatically.

In other words the Python interpreter rewrites:

urlaccess.retrieve(url, saveas)

into something that looks like this:

urlaccess.retrieve(urlaccess, url, saveas)

So you don't have to do it yourself. However, since

explicit is better than implicit

any instance methods you declare for a Python object must specify explicitly that they take the instance of the object as their first argument even though Python will pass that argument without any action on the part of the programmer.

The first argument does not have to be called self ... that is only a convention.


So, to actually answer your question though (as mluebke did) -- you need to specify the self argument.

def http_error_404(url, fp, errcode, errmsg, headers, data):
    url.file_was_found = False
    # Python is treating `url` as `self`
    # Therefore the URL is being saved in `fp`, `fp` in `errcode`, etc.

To fix this problem add a first argument to pick up the instance.

def http_error_404(self, url, fp, errcode, errmsg, headers, data):
    self.file_was_found = False
    # Now everything should work


self is explicitly listed in the method definition, but implicitly passed when the method is called. Change your function to look like this and all your variables will start to line up again.

def http_error_404(self, url, fp, errcode, errmsg, headers, data):
    self.file_was_found = False
0

精彩评论

暂无评论...
验证码 换一张
取 消