开发者

urllib2 times out but doesn't close socket connection

开发者 https://www.devze.com 2022-12-18 06:28 出处:网络
I\'m making a python URL grabbe开发者_高级运维r program. For my purposes, I want it to time out really really fast, so I\'m doing

I'm making a python URL grabbe开发者_高级运维r program. For my purposes, I want it to time out really really fast, so I'm doing

urllib2.urlopen("http://.../", timeout=2)

Of course it times out correctly as it should. However, it doesn't bother to close the connection to the server, so the server thinks the client is still connected. How can I ask urllib2 to just close the connection after it times out?

Running gc.collect() doesn't work and I'd like to not use httplib if I can't help it.

The closest I can get is: the first try will time out. The server reports that the connection closed just as the second try times out. Then, the server reports the connection closed just as the third try times out. Ad infinitum.

Many thanks.


I have a suspicion that the socket is still open in the stack frames. When Python raises an exception it stores the stack frames so debuggers and other tools can view the stack and introspect values.

For historical reasons, and now for backwards compatibility, the stack information is stored (on a per-thread basis) in sys (see sys.exc_info(), sys.exc_type and others). This is one of the things which has been removed in Python 3.0.

What that means for you is the stack is still alive, and referenced. There stack contains the local data for some function which has the open socket. That's why the socket isn't yet closed. It's only when the stack trace is removed that everything will be gc'ed.

To test if that's the case, insert something like

try:
  1/0
except ZeroDivisionError:
  pass

in your except clause. That's a quick way to replace the current exception with something else.


This is SUCH a hack, but the following code works. If the request is in another function AND it does not raise an exception, then the socket is always closed.

def _fetch(self, url):
    try:
        return urllib2.urlopen(urllib2.Request(url), timeout=5).read()
    except urllib2.URLError, e:
        if isinstance(e.reason, socket.timeout):
            return None
        else:
            raise e

def fetch(self, url):
    x = None
    while x is None:
        x = self._fetch(url)
        print "Timeout"
    return x

Does ANYONE have a better way?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号