开发者

Recovering from ECONNRESET in Python/Mechanize

开发者 https://www.devze.com 2023-02-07 14:11 出处:网络
I\'ve got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files.Clearly, any downloader that big is occasionally going to run into some ECONN

I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET errors. Now, I know how to handle each of these individually, but there's two problems with that:

开发者_开发知识库
  1. I'd really rather not wrap every single outbound web call in a try/catch block.
  2. Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just

    data = browser.response().read()
    

    then I know precisely how to deal with it, namely:

    data = None
    while (data == None):
        try:
            data = browser.response().read()
        except IOError as e:
            if e.args[1].args[0].errno != errno.ECONNRESET:
                raise
            data = None
    

    but if it's just a random instance of

    browser.follow_link(link)
    

    then how do I know what Mechanize's internal state looks like if an ECONNRESET is thrown somewhere in here? For example, do I need to call browser.back() before I try the code again? What's the proper way to recover from that kind of error?

EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.


Perhaps place the try..except block higher up in the chain of command:

import collections
def download_file(url):
    # Bundle together the bunch of browser calls necessary to download one file.
    browser.follow_link(...)
    ...
    response=browser.response()
    data=response.read()

urls=collections.deque(urls)

while urls:
    url=urls.popleft()
    try:
        download_file(url)
    except IOError as err:
        if err.args[1].args[0].errno != errno.ECONNRESET:
            raise
        else:
            # if ECONNRESET error, add the url back to urls to try again later
            urls.append(url)
0

精彩评论

暂无评论...
验证码 换一张
取 消