开发者

Validating a zip file coming from stdin

开发者 https://www.devze.com 2023-03-19 06:13 出处:网络
After some frustration with unzip(1L), I\'ve been trying to create a script that will unzip and print out raw da开发者_运维知识库ta from all of the files inside a zip archive that is coming from stdin

After some frustration with unzip(1L), I've been trying to create a script that will unzip and print out raw da开发者_运维知识库ta from all of the files inside a zip archive that is coming from stdin. I currently have the following, which works:

import sys, zipfile, StringIO

stdin = StringIO.StringIO(sys.stdin.read())
zipselect = zipfile.ZipFile(stdin)

filelist = zipselect.namelist()
for filename in filelist:
    print filename, ':' 
    print zipselect.read(filename)

When I try to add validation to check if it truly is a zip file, however, it doesn't like it.

...

zipcheck = zipfile.is_zipfile(zipselect)
if zipcheck is not None:
    print 'Input is not a zip file.'
    sys.exit(1)

...

results in

File "/home/chris/simple/zipcat/zipcat.py", line 13, in <module>
  zipcheck = zipfile.is_zipfile(zipselect)
File "/usr/lib/python2.7/zipfile.py", line 149, in is_zipfile
  result = _check_zipfile(fp=filename)
File "/usr/lib/python2.7/zipfile.py", line 135, in _check_zipfile
  if _EndRecData(fp):
File "/usr/lib/python2.7/zipfile.py", line 203, in _EndRecData
  fpin.seek(0, 2)
AttributeError: ZipFile instance has no attribute 'seek'

I assume it can't seek because it is not a file, as such?

Sorry if this is obvious, this is my first 'go' with Python.


You should pass stdin to is_zipfile, not zipselect. is_zipfile takes a path to a file or a file object, not a ZipFile.

See the zipfile.is_zipfile documentation

You are correct that a ZipFile can't seek because it isn't a file. It's an archive, so it can contain many files.


To do this entirely in memory will take some work. The AttributeError message means that the is_zipfile method is trying to use the seek method of the file handle you provide. But standard input is not seekable, and therefore your file object for it has no seek method.

If you really, really can't store the file on disk temporarily, then you could buffer the entire file in memory (you would need to enforce a size limit for security), and then implement some "duck" code that looks and acts like a seekable file object but really just uses the byte-string in memory.

It is possible that you could cheat and buffer only enough of the data for is_zipfile to do its work, but I seem to recall that the table-of-contents for ZIP is at the end of the file. I could be wrong about that though.


Your 2011 python2 fragment was: StringIO.StringIO(sys.stdin.read())

In 2018 a python3 programmer might phrase that as: io.StringIO(...).

What you wanted was the following python3 fragment: io.BytesIO(...). Certainly that works well for me when using the requests module to download binary ZIP files from webservers:

zf = zipfile.ZipFile(io.BytesIO(req.content))
0

精彩评论

暂无评论...
验证码 换一张
取 消