I would like to scan text of textfiles in Matlab with the textscan function. Before I can open the textfile with fid = fopen('C:\path'), I need to unzip the files first. The files have the extension: *.gz
There are thousands of fi开发者_Python百科les which I need to analyze and high performance is important.
I have two ideas: (1) Use an external program an call it from the command line in Matlab (2) Use a Matlab 'zip'toolbox. I have heard of gunzip, but don't know about its performance.
Does anyone knows a way to unzip these files as quick as possible from within Matlab?
Thanks!
You could always try the Matlab unzip() function:
unzip
Extract contents of zip file
Syntax
unzip(zipfilename) unzip(zipfilename, outputdir) unzip(url, ...) filenames = unzip(...)
Description
unzip(zipfilename) extracts the archived contents of zipfilename into the current folder and sets the files' attributes, preserving the timestamps. It overwrites any existing files with the same names as those in the archive if the existing files' attributes and ownerships permit it. For example, files from rerunning unzip on the same zip filename do not overwrite any of those files that have a read-only attribute; instead, unzip issues a warning for such files.
Internally, this uses Java's zip library org.apache.tools.zip
. If your zip archives each contain many text files it might be faster to drop down into Java and extract them entry by entry, without explicitly unzipped files. look at the source of unzip.m to get some ideas, and also the Java documentation.
I've found 7zip-commandline(Windows) / p7zip(Unix) to be somewhat speedier for this.
[edit]From some quick testing, it seems making a system call to gunzip is faster than using MATLAB's native gunzip. You could give that a try as well.
Just write a new function that imitates basic MATLAB gunzip functionality:
function [] = sunzip(fullfilename,output_dir)
if ~exist('output_dir','var'), output_dir = fileparts(fullfilename); endapp_path = '/usr/bin/7za';
switches = ' e'; %extract files ignoring directory structure
options = [' -o' output_dir];system([app_path switches options '_' fullfilename]);
Then use it as you would use gunzip:
sunzip('/data/time_1000.out.gz',tmp_dir);
With MATLAB's toc
timer, I get the following extraction times with 6 uncompressed 114MB ASCII files:
gunzip: 10.15s
sunzip: 7.84s
worked well, just needed a minor change to Max's syntax calling the executable.
system([app_path switches ' ' fullfilename options ]);
精彩评论