I've got a large bulk downloading application written in Python/Mechanize, aiming to download something like 20,000 files. Clearly, any downloader that big is occasionally going to run into some ECONNRESET
errors. Now, I know how to handle each of these individually, but there's two problems with that:
- I'd really rather not wrap every single outbound web call in a try/catch block.
Even if I were to do so, there's trouble with knowing how to handle the errors once the exception has thrown. If the code is just
data = browser.response().read()
then I know precisely how to deal with it, namely:
data = None while (data == None): try: data = browser.response().read() except IOError as e: if e.args[1].args[0].errno != errno.ECONNRESET: raise data = None
but if it's just a random instance of
browser.follow_link(link)
then how do I know what Mechanize's internal state looks like if an
ECONNRESET
is thrown somewhere in here? For example, do I need to callbrowser.back()
before I try the code again? What's the proper way to recover from that kind of error?
EDIT: The solution in the accepted answer certainly works, and in my case it turned out to be not so hard to implement. I'm still academically interested, however, in whether there's an error handling mechanism that could result in quicker error catching.
Perhaps place the try..except block higher up in the chain of command:
import collections
def download_file(url):
# Bundle together the bunch of browser calls necessary to download one file.
browser.follow_link(...)
...
response=browser.response()
data=response.read()
urls=collections.deque(urls)
while urls:
url=urls.popleft()
try:
download_file(url)
except IOError as err:
if err.args[1].args[0].errno != errno.ECONNRESET:
raise
else:
# if ECONNRESET error, add the url back to urls to try again later
urls.append(url)
精彩评论