I'm using libsvm for multi-class classification of datasets with a large number of features/attributes (around 5,800 per each item). I'd like to choose better parameters for C and Gamma than the defaults I am currently using.
I've already tried running easy.py, but for the datasets I'm using, the estimated time is near开发者_如何学运维 forever (ran easy.py at 20, 50, 100, and 200 data samples and got a super-linear regression which projected my necessary runtime to take years).
Is there a way to more quickly arrive at better C and Gamma values than the defaults? I'm using the Java libraries, if that makes any difference.
It's possible to accomplish this without a grid search, as I believe easy.py
does.
Look at this paper from Trevor Hastie, et al: The Entire Regularization Path for the Support Vector Machine (PDF). One "SVM run" will calculate the loss for all values of "C" in one shot, so you can see how it effects your SVM performance.
They have an implementation of this algorithm that you can use in R through the svmpath package.
I believe the core of the algorithm is written in fortran, but is wrapped in R.
精彩评论