开发者

Reducing variable set of a prediction data frame by fetching variables from a tree model

开发者 https://www.devze.com 2023-02-16 14:27 出处:网络
I built an rpart tree model and now I want to extract the used variables in this model out of a big prediction dataframe (over 7.000 variables), because I have to to some calculations on this predicti

I built an rpart tree model and now I want to extract the used variables in this model out of a big prediction dataframe (over 7.000 variables), because I have to to some calculations on this prediction dataframe before prediction and this calculation exceeds memory.

Now I don't know how to extract the variables from the rpart model. For randomForest-models, there is the function varUsed, but perhaps the problem might be cleared in a general way, so also for a glm-model.

names() on the rpart-Model gives back:

"frame"     "where"     "call"      "terms"     "cptable"   "splits"    "method"
"parms"     "control"   "functions" "model"     "y"         "ordered" 

The split-value gives back:

count ncat     improve index        adj  
**m24_a_ec_fakt**               6000   -1 0.026346646  0.15 0.00000000  
**m24_a_ec_fakt_dwl**           6000   -1 0.026346646  0.15 0.00000000  
**m3_a_fak_rech**               6000   -1 0.022821246  0.30 0.00000000  
**m9_a_ec_fakt**                6000   -1 0.021599372  0.05 0.00000000  
**m9_a_ec_fakt_dwl**            6000   -1 0.021599372  0.05 0.00000000  
... 

The split is a matrix and the first column(?) are the variable names.

Can I refer somehow on this matrix to filter the variables of my prediction dataframe by name?

something like:

newPredDM<- oldPredDM[  --GET THE VARIABLE NAMES 开发者_StackOverflowFROM rpart-Modell somehow--  ]

regards and thnx for help, Rainer


See help("rpart.object") for the structure of the returned value. Since

frame: data frame with one row for each node in the tree. [...] Elements of ‘frame’ include ‘var’, a factor giving the variable used in the split at each node

you can use levels(fit$frame$var)[-1] to get the columns as a character string vector and use something like

newPredDM<- oldPredDM[, levels(fit$frame$var)[-1]]

for your selection.

Hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消