I have a micro-array data of 38 row and 7130 columns. I am trying to read the data but keeping having the 开发者_StackOverflow社区above error.
I debugged and found when I read the data, I have a 1x7129 instead of a 38x7130. I don't know why. My 7130th column contains letters while the rest of the data are numbers. Any idea why this is happening?
My file is in text tab delimited and here is my code for reading the file:
clear;
fn=32;
col=fn+1;
cluster=2;
num_eachClass=3564;
row=num_eachClass*cluster;
fid1 = fopen('data.txt', 'r');
txt_format='';
for t=1:col txt_format=[txt_format '%g '];
end
data = fscanf(fid1,txt_format,[col row]);
data = data'; fclose(fid1);
Try this code to read the data:
filename = 'yourfilename.txt';
fid = fopen(filename,'r');
% If you have a line with column headers use those 3 lines. Comment if not.
colnames = fgetl(fid);
colnames = textscan(colnames, '%s','delimiter','\t');
colnames = colnames{:};
% Reading the data
tsformat = [repmat('%f ',1,7129) '%s'];
datafromfile = textscan(fid,tsformat,'delimiter','\t','CollectOutput',1);
fclose(fid);
% Get the data from the cell array
data = datafromfile{1};
labels = datafromfile{2};
EDIT To separate your dataset to training and test, do something like this:
train_samp = 1:19;
test_samp = 20:38;
train_data = data(train_samp,:);
test_data = data(test_samp,:);
train_label = labels(train_samp);
test_label = labels(test_samp);
You can also separate samples randomly:
samp_num = size(data,1);
test_num = 19;
randorder = randperm(samp_num);
train_samp = randorder(test_num+1:samp_num);
test_samp = randorder(1:test_num);
I haven't done transposition data = data';
.
If you have to, just switch row and column indexes in the above code:
train_data = data(:,train_samp);
test_data = data(:,test_samp);
精彩评论