There are many reports of开发者_JAVA百科 slow performance of Octave's dlmread
. I was hoping that this was fixed in 3.2.4, but when I tried to load a csv file that has a size of ca. 8 * 4 mil (32 mil in total), it also took very, very long time. I searched the web but could not find a workaround for this. Does anybody know a good workaround?
I experienced the same problem and had R handy, so my solution was to use "read.csv" in R, and then use the R package "R.matlab" to write a ".mat" file, and then load that in Octave.
"read.csv" can be pretty slow too, but this worked very well in my case.
The reason is that Octave has a bug that adding data to a very large matrix takes more time then adding the same amount of data to a small matrix.
Below is my try. I choose to save data each 50000 lines, so meanwhile I could already take a look instead of being forced to wait. It is slower for small files, but much faster for larger files.
function alldata = load_data(filename)
fid = fopen(filename,'r');
s=0;
data=[];
alldata=[];
save "temp.mat" alldata;
if fid == -1
disp("Couldn't find file mydata");
else
while (~feof(fid))
line = fgetl(fid);
[t1,t2,t3,t4,d] = sscanf(line,'%i:%i:%i:%i %f', "C"); #reading time as hh:mm:ss:ms and data as float
s++;
t = (t1 * 3600000 + t2 * 60000 + t3 * 1000 + t4);
data = [data; t, d];
if (mod(s,10000) == 0)
#disp(s), disp(" "), disp(t), disp(" "), disp(d), disp("\n");
disp(s);
fflush(stdout);
end
if (mod(s,50000) == 0)
load "temp.mat";
alldata=[alldata; data];
data=[];
save "temp.mat" alldata;
disp("data saved");
fflush(stdout);
end
end
disp(s);
load "temp.mat";
alldata=[alldata; data];
save "temp.mat" alldata;
disp("data saved");
fflush(stdout);
end
fclose(fid);
Here is a workaround that I am using.
I did not find that sscanf will parse input lines as indicated above. Also, I didn't use the temp file.
My .csv file has a large number of rows. They begin with a header of 18 lines and are followed by a data block, each of which has 135 columns. The following code has been tested. My file also begins each row with a dd/mm/yyyy hh:mm field. This will also catch poor lines and indicate where they are by using try/catch.
My .csv file came from a customer who dumped his PARCView load in an Excel file.
function [tags,descr,alldata] = fbcsvread(filename)
fid = fopen(filename,'r');
s = 0;
data=[];
alldata=zeros(1,135);
if fid==-1
disp("Couldn't find file %s\n",filename);
else
linecount = 1;
while (~feof(fid))
line = fgetl(fid);
data2 = zeros(1,135);
if linecount == 1
tags = strsplit(line,",");
elseif linecount == 2
descr = strsplit(line,",");
elseif linecount >= 19
data = strsplit(line,",");
datetime = strsplit(char(data(1))," ");
modyyr = strsplit(char(datetime(1)),"/");
hrmin = strsplit(char(datetime(2)),":");
year1 = sscanf(char(modyyr(3)),"%d","C");
day1 = sscanf(char(modyyr(2)),"%d","C");
month1 = sscanf(char(modyyr(1)),"%d","C");
hour1 = sscanf(char(hrmin(1)),"%d","C");
minute1 = sscanf(char(hrmin(2)),"%d","C");
realtime = datenum(year1,month1,day1,hour1,minute1);
data2(1) = realtime;
for location = 2:134
try
data2(location) = sscanf(char(data(location)),"%f","C");
catch
printf("Error at %s %s\n",char(datetime(1)),char(datetime(2)) );
fflush(stdout);
end_try_catch
endfor
alldata(linecount-18,:) = data2;
if mod(linecount,50) == 0
printf(".");
fflush(stdout);
endif
endif
linecount = linecount + 1;
endwhile
fclose(fid);
endif
endfunction
精彩评论