开发者

Download and unzip file with Python

开发者 https://www.devze.com 2023-03-23 19:19 出处:网络
I am trying to download and open a zipped file and seem to be having trouble using a file开发者_开发技巧 type handle with zipfile. I\'m getting the error \"AttributeError: addinfourl instance has no a

I am trying to download and open a zipped file and seem to be having trouble using a file开发者_开发技巧 type handle with zipfile. I'm getting the error "AttributeError: addinfourl instance has no attribute 'seek'" when running this:

import zipfile
import urllib2

def download(url,directory,name):
 webfile = urllib2.urlopen('http://www.sec.gov'+url)
 webfile2 = zipfile.ZipFile(webfile)
 content = zipfile.ZipFile.open(webfile2).read()
 localfile = open(directory+name, 'w')
 localfile.write(content)
 localfile.close()
 return()

download(link.get("href"),'./fails_data', link.text)


Putting things together, the following retrieves the content of the first file within a zipped file from a website:

import urllib.request
import zipfile
    
url = 'http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip'
filehandle, _ = urllib.request.urlretrieve(url)
zip_file_object = zipfile.ZipFile(filehandle, 'r')
first_file = zip_file_object.namelist()[0]
file = zip_file_object.open(first_file)
content = file.read()


As of 2020, you can use dload to download and unzip a file, i.e.:

import dload
dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip")

By default it extracts to a dir on the script path with the zip file name, but you can specify the extract location:

dload.save_unzip("https://file-examples.com/wp-content/uploads/2017/02/zip_2MB.zip", "/extract/here")

install using pip install dload


You can't seek on a urllib2.urlopened file. The methods it supports are listed here: http://docs.python.org/library/urllib.html#urllib.urlopen.

You'll have to retrieve the file (possibly with urllib.urlretrieve, http://docs.python.org/library/urllib.html#urllib.urlretrieve), then use zipfile on it.

Alternatively, you could read() the urlopened file, then put it into a StringIO, then use zipfile on that, if you wanted the zipped data in memory. Also check out the extract and extract_all methods of zipfile if you just want to extract the file, instead of using read.


I do not have enough rep to comment but regarding Marius's answer above please note that for Python3 there is a slight modification needed regarding import and urlretrieve call, since urllib has been split into several modules.

import urllib

Becomes:

import urllib.request

And

filehandle, _ = urllib.urlretrieve(url)

Becomes

filehandle, _ = urllib.request.urlretrieve(url)


Iterating on @Marius answer (which reads a single file directly from the zip), if you want to extract all files to a directory, do this:

import urllib
import zipfile

url = "http://www.gutenberg.lib.md.us/4/8/8/2/48824/48824-8.zip"
extract_dir = "example"

zip_path, _ = urllib.request.urlretrieve(url)
with zipfile.ZipFile(zip_path, "r") as f:
    f.extractall(extract_dir)

This stores the zip file in a temporary dir. If you want to keep it around, you can pass a filename to urlretrieve, e.g. urllib.request.urlretrieve(url, "my_zip_file.zip").

0

精彩评论

暂无评论...
验证码 换一张
取 消