Is there a Python library that allows manipulation of zip archives in memory, without having to use actual disk files?
The ZipFile library does not allow you to update the archive. The only way seems to be to extract it to a directory, make your changes, and create a new zip from that directory. I want to modify zip archives without disk access, because I'll be downloading them, making changes, and uploading them again, so I have no reason to store them.
Something similar to Java's 开发者_高级运维ZipInputStream/ZipOutputStream would do the trick, although any interface at all that avoids disk access would be fine.
According to the Python docs:
class zipfile.ZipFile(file[, mode[, compression[, allowZip64]]])
Open a ZIP file, where file can be either a path to a file (a string) or a file-like object.
So, to open the file in memory, just create a file-like object (perhaps using BytesIO).
file_like_object = io.BytesIO(my_zip_data)
zipfile_ob = zipfile.ZipFile(file_like_object)
PYTHON 3
import io
import zipfile
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a",
zipfile.ZIP_DEFLATED, False) as zip_file:
for file_name, data in [('1.txt', io.BytesIO(b'111')),
('2.txt', io.BytesIO(b'222'))]:
zip_file.writestr(file_name, data.getvalue())
with open('C:/1.zip', 'wb') as f:
f.write(zip_buffer.getvalue())
From the article In-Memory Zip in Python:
Below is a post of mine from May of 2008 on zipping in memory with Python, re-posted since Posterous is shutting down.
I recently noticed that there is a for-pay component available to zip files in-memory with Python. Considering this is something that should be free, I threw together the following code. It has only gone through very basic testing, so if anyone finds any errors, let me know and I’ll update this.
import zipfile
import StringIO
class InMemoryZip(object):
def __init__(self):
# Create the in-memory file-like object
self.in_memory_zip = StringIO.StringIO()
def append(self, filename_in_zip, file_contents):
'''Appends a file with name filename_in_zip and contents of
file_contents to the in-memory zip.'''
# Get a handle to the in-memory zip in append mode
zf = zipfile.ZipFile(self.in_memory_zip, "a", zipfile.ZIP_DEFLATED, False)
# Write the file to the in-memory zip
zf.writestr(filename_in_zip, file_contents)
# Mark the files as having been created on Windows so that
# Unix permissions are not inferred as 0000
for zfile in zf.filelist:
zfile.create_system = 0
return self
def read(self):
'''Returns a string with the contents of the in-memory zip.'''
self.in_memory_zip.seek(0)
return self.in_memory_zip.read()
def writetofile(self, filename):
'''Writes the in-memory zip to a file.'''
f = file(filename, "w")
f.write(self.read())
f.close()
if __name__ == "__main__":
# Run a test
imz = InMemoryZip()
imz.append("test.txt", "Another test").append("test2.txt", "Still another")
imz.writetofile("test.zip")
The example Ethier provided has several problems, some of them major:
- doesn't work for real data on Windows. A ZIP file is binary and its data should always be written with a file opened 'wb'
- the ZIP file is appended to for each file, this is inefficient. It can just be opened and kept as an
InMemoryZip
attribute - the documentation states that ZIP files should be closed explicitly, this is not done in the append function (it probably works (for the example) because zf goes out of scope and that closes the ZIP file)
- the create_system flag is set for all the files in the zipfile every time a file is appended instead of just once per file.
- on Python < 3 cStringIO is much more efficient than StringIO
- doesn't work on Python 3 (the original article was from before the 3.0 release, but by the time the code was posted 3.1 had been out for a long time).
An updated version is available if you install ruamel.std.zipfile
(of which I am the author). After
pip install ruamel.std.zipfile
or including the code for the class from here, you can do:
import ruamel.std.zipfile as zipfile
# Run a test
zipfile.InMemoryZipFile()
imz.append("test.txt", "Another test").append("test2.txt", "Still another")
imz.writetofile("test.zip")
You can alternatively write the contents using imz.data
to any place you need.
You can also use the with
statement, and if you provide a filename, the contents of the ZIP will be written on leaving that context:
with zipfile.InMemoryZipFile('test.zip') as imz:
imz.append("test.txt", "Another test").append("test2.txt", "Still another")
because of the delayed writing to disc, you can actually read from an old test.zip
within that context.
I am using Flask to create an in-memory zipfile and return it as a download. Builds on the example above from Vladimir. The seek(0)
took a while to figure out.
import io
import zipfile
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file:
for file_name, data in [('1.txt', io.BytesIO(b'111')), ('2.txt', io.BytesIO(b'222'))]:
zip_file.writestr(file_name, data.getvalue())
zip_buffer.seek(0)
return send_file(zip_buffer, attachment_filename='filename.zip', as_attachment=True)
I want to modify zip archives without disk access, because I'll be downloading them, making changes, and uploading them again, so I have no reason to store them
This is possible using the two libraries https://github.com/uktrade/stream-unzip and https://github.com/uktrade/stream-zip (full disclosure: written by me). And depending on the changes, you might not even have to store the entire zip in memory at once.
Say you just want to download, unzip, zip, and re-upload. Slightly pointless, but you could slot in some changes to the unzipped content:
from datetime import datetime
import httpx
from stream_unzip import stream_unzip
from stream_zip import stream_zip, ZIP_64
def get_source_bytes_iter(url):
with httpx.stream('GET', url) as r:
yield from r.iter_bytes()
def get_target_files(files):
# stream-unzip doesn't expose perms or modified_at, but stream-zip requires them
modified_at = datetime.now()
perms = 0o600
for name, _, chunks in files:
# Could change name, manipulate chunks, skip a file, or yield a new file
yield name.decode(), modified_at, perms, ZIP_64, chunks
source_url = 'https://source.test/file.zip'
target_url = 'https://target.test/file.zip'
source_bytes_iter = get_source_bytes_iter(source_url)
source_files = stream_unzip(source_bytes_iter)
target_files = get_target_files(source_files)
target_bytes_iter = stream_zip(target_files)
httpx.put(target_url, data=target_bytes_iter)
Helper to create in-memory zip file with multiple files based on data like {'1.txt': 'string', '2.txt": b'bytes'}
import io, zipfile
def prepare_zip_file_content(file_name_content: dict) -> bytes:
"""returns Zip bytes ready to be saved with
open('C:/1.zip', 'wb') as f: f.write(bytes)
@file_name_content dict like {'1.txt': 'string', '2.txt": b'bytes'}
"""
zip_buffer = io.BytesIO()
with zipfile.ZipFile(zip_buffer, "a", zipfile.ZIP_DEFLATED, False) as zip_file:
for file_name, file_data in file_name_content.items():
zip_file.writestr(file_name, file_data)
zip_buffer.seek(0)
return zip_buffer.getvalue()
精彩评论