开发者

How to close file objects when downloading files over FTP using Twisted?

开发者 https://www.devze.com 2023-01-10 19:07 出处:网络
I\'ve got the following code: for f in fileListProtocol.files: if f[\'filetype\'] == \'-\': filename = os.path.join(directory[\'filename\'], f[\'filename\'])

I've got the following code:

for f in fileListProtocol.files:
    if f['filetype'] == '-':
        filename = os.path.join(directory['filename'], f['filename'])
        print 'Downloading %s...' % (filename)
        newFile = open(filename, 'w+')
        d = ftpClient.retrieveFile(filename, FileConsumer(newFile))
        d.addCallback(closeFile, newFile)

Unfortunately, after downloading several hundred of the 1000+ files in the directory in question I get an IOError about too many open files. Why is this when I should be closing each file after they've been downloaded? If there's a more idiomatic way to approach the whole task of downloading lots of fil开发者_开发百科es too, I'd love to hear it. Thanks.

Update: Jean-Paul's DeferredSemaphore example plus Matt's FTPFile did the trick. For some reason using a Cooperator instead of DeferredSemaphore would download a few files and then fail because the FTP connection would have died.


Assuming that you're using FTPClient from twisted.protocols.ftp... and I certainly hesitate before contradicting JP..

It seems that the FileConsumer class you're passing to retrieveFile will be adapted to IProtocol by twisted.internet.protocol.ConsumerToProtocolAdapter, which doesn't call unregisterProducer, so FileConsumer doesn't close the file object.

I've knocked up a quick protocol that you can use to receive the files. I think it should only open the file when appropriate. Totally untested, you'd use it in place of FileConsumer in your code above and won't need the addCallback.

from twisted.python import log
from twisted.internet import interfaces
from zope.interface import implements

class FTPFile(object):
    """
    A consumer for FTP input that writes data to a file.

    @ivar filename: a filename to be opened for writing.
    """

    implements(interfaces.IProtocol)

    def __init__(self, filename):
        self.fObj = None
        self.filename = filename

    def makeConnection(self,transport)
        self.fObj = open(self.filename,'wb')
        log.info('Opened %s for writing' % self.filename)

    def connectionLost(self,reason):
        self.fObj.close()
        log.info('Closed %s' % self.filename)

    def dataReceived(self, bytes):
        self.fObj.write(bytes)


You're opening every file in fileListProtocol.files simultaneously, downloading contents to them, and then closing each when each download is complete. So, you have len(fileListProtocol.files) files open at the beginning of the process. If there are too many files in that list, then you'll try to open too many files.

You probably want to limit yourself to some fairly small number of parallel downloads at once (if FTP even supports parallel downloads, which I'm not entirely certain is the case).

http://jcalderone.livejournal.com/24285.html and Queue remote calls to a Python Twisted perspective broker? may be of some help in figuring out how to limit the number of downloads you start in parallel.

0

精彩评论

暂无评论...
验证码 换一张
取 消