After some frustration with unzip(1L)
, I've been trying to create a script that will unzip and print out raw da开发者_运维知识库ta from all of the files inside a zip archive that is coming from stdin. I currently have the following, which works:
import sys, zipfile, StringIO
stdin = StringIO.StringIO(sys.stdin.read())
zipselect = zipfile.ZipFile(stdin)
filelist = zipselect.namelist()
for filename in filelist:
print filename, ':'
print zipselect.read(filename)
When I try to add validation to check if it truly is a zip file, however, it doesn't like it.
...
zipcheck = zipfile.is_zipfile(zipselect)
if zipcheck is not None:
print 'Input is not a zip file.'
sys.exit(1)
...
results in
File "/home/chris/simple/zipcat/zipcat.py", line 13, in <module>
zipcheck = zipfile.is_zipfile(zipselect)
File "/usr/lib/python2.7/zipfile.py", line 149, in is_zipfile
result = _check_zipfile(fp=filename)
File "/usr/lib/python2.7/zipfile.py", line 135, in _check_zipfile
if _EndRecData(fp):
File "/usr/lib/python2.7/zipfile.py", line 203, in _EndRecData
fpin.seek(0, 2)
AttributeError: ZipFile instance has no attribute 'seek'
I assume it can't seek because it is not a file, as such?
Sorry if this is obvious, this is my first 'go' with Python.
You should pass stdin
to is_zipfile
, not zipselect
. is_zipfile
takes a path to a file or a file object, not a ZipFile
.
See the zipfile.is_zipfile documentation
You are correct that a ZipFile
can't seek because it isn't a file. It's an archive, so it can contain many files.
To do this entirely in memory will take some work. The AttributeError
message means that the is_zipfile
method is trying to use the seek
method of the file handle you provide. But standard input is not seekable, and therefore your file object for it has no seek
method.
If you really, really can't store the file on disk temporarily, then you could buffer the entire file in memory (you would need to enforce a size limit for security), and then implement some "duck" code that looks and acts like a seekable file object but really just uses the byte-string in memory.
It is possible that you could cheat and buffer only enough of the data for is_zipfile
to do its work, but I seem to recall that the table-of-contents for ZIP is at the end of the file. I could be wrong about that though.
Your 2011 python2 fragment was: StringIO.StringIO(sys.stdin.read())
In 2018 a python3 programmer might phrase that as: io.StringIO(...).
What you wanted was the following python3 fragment: io.BytesIO(...).
Certainly that works well for me when using the requests
module to download binary ZIP files from webservers:
zf = zipfile.ZipFile(io.BytesIO(req.content))
精彩评论