开发者

How to download large file with binary mode in python?

开发者 https://www.devze.com 2023-02-12 22:25 出处:网络
I am code a download function in python. The file size >1GB. The server is linux, HTTP server is Karrigell. Client is browse, Firefox or IE. I meet a big trouble.

I am code a download function in python. The file size >1GB. The server is linux, HTTP server is Karrigell. Client is browse, Firefox or IE. I meet a big trouble.

At first, I use sys.stdout() to send file content.

file = open(path, 'rb')
size = os.path.getsize(path)

RESPONSE['Pragma'] = 'public'
RESPONSE['Expires'] = '0'
RESPONSE['Cache-Control'] = 'must-revalidate, pre-check=0'
RESPONSE['Content-Disposition'] = 'attachment; filename="' + os.path.basename(path) + '"'
RESPONSE['Content-type'] = "application/octet-stream"
RESPONSE['Content-Transfer-Encoding'] = 'binary'
RESPONSE['Content-length'] = str(os.path.getsize(path))

sys.stdout.flush()
chunk_size = 10000
handle = open(path, "rb")
while True:
    buffer = handle.read(chunk_size)
    if buffer:
        STDOUT(buffer)
    else:
        break
sys.stdout.flush()

The problem is the server out of memory! I know, stdout write content to memory first, then memory send to socket.

So, I modify the function. Send content to socket directly. I use the py-sendfile module. http://code.google.com/p/py-sendfile/

file = open(path, 'rb')
size = os.path.getsize(path)

sock = REQUEST_HANDLER.sock
sock.sendall("""HTTP/1.1 200 OK\r\nPragma: no-cache\r\nExpires: 0\r\nCache-Control: no-cache, no-store\r\nContent-Disposition: attachment; filename="%s"\r\nContent-Type: application/octet-stream\r\nContent-Length: %u\r\nContent-Range: bytes 0-4096/%u\r\nLocation: "%s"\r\n\r\n""" % (os.path.basename(path), size, size, os.path.basename(path)))

offset = 0
nbytes = 4096
while 1:
    try:
        sent = sendfile.sendfile(sock.fileno(), file.fileno(), offset, nbytes)
    except OSError, err:
        if err.errno in (errno.EAGAIN, errno.EBUSY):  # retry
            continue
        raise
    else:
        if sent == 0:
            break    # done
        offset += sent

This time, the server memory is OK, but browse die! The browse memory rise quickly! Not free until the socket accept whole file content.

I don't know how to deal with these problems. I think the 开发者_运维问答second idea is right, send content to socket directly. But why browse can't free memory while accept data?


You should try to download the file in chunks. This is an example that works for me using urllib2

import os
import urllib2
import math

def downloadChunks(url):
    """Helper to download large files
        the only arg is a url
       this file will go to a temp directory
       the file will also be downloaded
       in chunks and print out how much remains
    """

    baseFile = os.path.basename(url)

    #move the file to a more uniq path
    os.umask(0002)
    temp_path = "/tmp/"
    try:
        file = os.path.join(temp_path,baseFile)

        req = urllib2.urlopen(url)
        total_size = int(req.info().getheader('Content-Length').strip())
        downloaded = 0
        CHUNK = 256 * 10240
        with open(file, 'wb') as fp:
            while True:
                chunk = req.read(CHUNK)
                downloaded += len(chunk)
                print math.floor( (downloaded / total_size) * 100 )
                if not chunk: break
                fp.write(chunk)
    except urllib2.HTTPError, e:
        print "HTTP Error:",e.code , url
        return False
    except urllib2.URLError, e:
        print "URL Error:",e.reason , url
        return False

    return file
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号