I am trying libsvm and I follow the example for training a svm on the heart_scale data which comes with the software. I want to use a chi2 kernel which I precompute myself. The classification rate on the training data drops to 24%. I am sure I 开发者_如何学编程compute the kernel correctly but I guess I must be doing something wrong. The code is below. Can you see any mistakes? Help would be greatly appreciated.
%read in the data:
[heart_scale_label, heart_scale_inst] = libsvmread('heart_scale');
train_data = heart_scale_inst(1:150,:);
train_label = heart_scale_label(1:150,:);
%read somewhere that the kernel should not be sparse
ttrain = full(train_data)';
ttest = full(test_data)';
precKernel = chi2_custom(ttrain', ttrain');
model_precomputed = svmtrain2(train_label, [(1:150)', precKernel], '-t 4');
This is how the kernel is precomputed:
function res=chi2_custom(x,y)
a=size(x);
b=size(y);
res = zeros(a(1,1), b(1,1));
for i=1:a(1,1)
for j=1:b(1,1)
resHelper = chi2_ireneHelper(x(i,:), y(j,:));
res(i,j) = resHelper;
end
end
function resHelper = chi2_ireneHelper(x,y)
a=(x-y).^2;
b=(x+y);
resHelper = sum(a./(b + eps));
With a different svm implementation (vlfeat) I obtain a classification rate on the training data (yes, I tested on the training data, just to see what is going on) around 90%. So I am pretty sure the libsvm result is wrong.
When working with support vector machines, it is very important to normalize the dataset as a pre-processing step. Normalization puts the attributes on the same scale and prevents attributes with large values from biasing the result. It also improves numerical stability (minimizes the likelihood of overflows and underflows due to floating-point representation).
Also to be exact, your calculation of the Chi-squared kernel is slightly off. Instead take the definition below, and use this faster implementation for it:
function D = chi2Kernel(X,Y)
D = zeros(size(X,1),size(Y,1));
for i=1:size(Y,1)
d = bsxfun(@minus, X, Y(i,:));
s = bsxfun(@plus, X, Y(i,:));
D(:,i) = sum(d.^2 ./ (s/2+eps), 2);
end
D = 1 - D;
end
Now consider the following example using the same dataset as you (code adapted from a previous answer of mine):
%# read dataset
[label,data] = libsvmread('./heart_scale');
data = full(data); %# sparse to full
%# normalize data to [0,1] range
mn = min(data,[],1); mx = max(data,[],1);
data = bsxfun(@rdivide, bsxfun(@minus, data, mn), mx-mn);
%# split into train/test datasets
trainData = data(1:150,:); testData = data(151:270,:);
trainLabel = label(1:150,:); testLabel = label(151:270,:);
numTrain = size(trainData,1); numTest = size(testData,1);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , chi2Kernel(trainData,trainData) ];
KK = [ (1:numTest)' , chi2Kernel(testData,trainData) ];
%# view 'train vs. train' kernel matrix
figure, imagesc(K(:,2:end))
colormap(pink), colorbar
%# train model
model = svmtrain(trainLabel, K, '-t 4');
%# test on testing data
[predTestLabel, acc, decVals] = svmpredict(testLabel, KK, model);
cmTest = confusionmat(testLabel,predTestLabel)
%# test on training data
[predTrainLabel, acc, decVals] = svmpredict(trainLabel, K, model);
cmTrain = confusionmat(trainLabel,predTrainLabel)
The result on the testing data:
Accuracy = 84.1667% (101/120) (classification)
cmTest =
62 8
11 39
and on the training data, we get around 90% accuracy as you expected:
Accuracy = 92.6667% (139/150) (classification)
cmTrain =
77 3
8 62
The problem is the following line:
resHelper = sum(a./(b + eps));
it should be:
resHelper = 1-sum(2*a./(b + eps));
精彩评论