I开发者_JAVA技巧 tried RSAR, a free package, but I wonder if there any other good attribute reducers out there. Even packages for R or MATLAB, any resource capable of letting me find the minimal set of attributes which classify data.
For example, having a set with hundreds of examples of mail and different attributes which describe them and classified as spam or not spam, I want to find the minimal set of attributes that describe all the data, to discard useless information.
Considering the type of problem you describe, that is: choosing the right attributes for email classification, the best way might be to use Weka (Weka home). It has several feature-selection algorithms, which could be applied both interactively to visualize their effect, or in conjunction with various classification algorithms, to evaluate their effect on actual classification. (note that choosing attributes for classification without proper validation for a specific classifier might lead to less than optimal results in real life).
Some relevant links:
Weka's manual regarding attribute selection
A (somewhat outdated) hands-on example
you can use RoughSets package of R language. See the description of FS.one.reduct.computation in R (after installing RoughSets package)
e.g: HIRING2Matrix is a Decision Table with number of attributes. reduct1 is the reduced set of attributes
reduct1<- FS.one.reduct.computation(HIRING2Matrix, greedy = TRUE, power = 1)
精彩评论