开发者

python unzip -- tremendously slow?

开发者 https://www.devze.com 2023-02-10 05:57 出处:网络
Can somebody please explain the foll开发者_StackOverflow中文版owing mystery? I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried

Can somebody please explain the foll开发者_StackOverflow中文版owing mystery?

I created a binary file of size ~37[MB]. zipping it in Ubuntu -- using the terminal -- took less than 1[sec]. I then tried python: zipping it programatically (using the zipfile module) took also about 1[sec].

I then tried to unzip the zip file I created. In Ubuntu -- using the terminal -- this took less than 1[sec].

In python, the code to unzip (used the zipfile module) took close to 37[sec] to run! any ideas why?


I was struggling to unzip/decompress/extract zip files with Python as well and that "create ZipFile object, loop through its .namelist(), read the files and write them to file system" low-level approach didn't seem very Python. So I started to dig zipfile objects that I believe not very well documented and covered all the object methods:

>>> from zipfile import ZipFile
>>> filepath = '/srv/pydocfiles/packages/ebook.zip'
>>> zip = ZipFile(filepath)
>>> dir(zip)
['NameToInfo', '_GetContents', '_RealGetContents', '__del__', '__doc__', '__enter__', '__exit__', '__init__', '__module__', '_allowZip64', '_didModify', '_extract_member', '_filePassed', '_writecheck', 'close', 'comment', 'compression', 'debug', 'extract', 'extractall', 'filelist', 'filename', 'fp', 'getinfo', 'infolist', 'mode', 'namelist', 'open', 'printdir', 'pwd', 'read', 'setpassword', 'start_dir', 'testzip', 'write', 'writestr'] 

There we go the "extractall" method works just like tarfile's extractall ! (on python 2.6 and 2.7 but NOT 2.5)

Then the performance concerns; the file ebook.zip is 84.6 MB (mostly pdf files) and uncompressed folder is 103 MB, zipped by default "Archive Utility" under MacOSx 10.5. So I did the same with Python's timeit module:

>>> from timeit import Timer
>>> t = Timer("filepath = '/srv/pydocfiles/packages/ebook.zip'; \
...         extract_to = '/tmp/pydocnet/build'; \
...         from zipfile import ZipFile; \
...         ZipFile(filepath).extractall(path=extract_to)")
>>> 
>>> t.timeit(1)
1.8670060634613037

which took less than 2 seconds on a heavy loaded machine that has 90% of the memory is being used by other applications.

Hope this helps someone.


I don't know what code you use to unzip your file, but the following works for me: After creating a zip archive "test.zip" containing just one file "file1", the following Python script extracts "file1" from the archive:

from zipfile import ZipFile, ZIP_DEFLATED
zip = ZipFile("test.zip", mode='r', compression=ZIP_DEFLATED, allowZip64=False)
data = zip.read("file1")
print len(data)

This takes nearly no time: I tried a 37MB input file which compressed down to a 15MB zip archive. In this example the Python script took 0.346 seconds on my MacBook Pro. Maybe in your case the 37 seconds were taken up by something you did with the data instead?


Instead of using the python module we can use the zip featured offered by ubuntu in python. I use this because sometimes the python zip fails.

import os

filename = test
os.system('7z a %s.zip %s'% (filename, filename))
0

精彩评论

暂无评论...
验证码 换一张
取 消