开发者

break dataframe into subsets by factor values, send to function that returns glm class, how to recombine?

开发者 https://www.devze.com 2023-01-02 10:01 出处:网络
Thanks to Hadley\'s plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdatafra

Thanks to Hadley's plyr package ddply function we can take a dataframe, break it down into subdataframes by factors, send each to a function, and then combine the function results for each subdataframe into a new dataframe.

But what if the function returns an object of a class like glm or in my case, a c("glm", "lm"). Then, these can't be combined into a dataframe can they? I get this error instead

Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class 'c("glm", "lm")' into a data.frame

Is there some more flexible data st开发者_开发知识库ructure that will accommodate all the complex glm class results of my function calls, preserving the information regarding the dataframe subsets?

Or should this be done in an entirely different way?


Just to expand my comment: plyr has set of functions to combine input and output type. So when you function returns something inconvertible to data.frame you should use list as output. So instead of using ddply use dlply.

When you want to do something on each model and convert results to data.frame then ldply is the key.

Lets create some models using dlply

list_of_models <- dlply(warpbreaks, .(tension), function(X) lm(breaks~wool, data=X))
str(list_of_models, 1)
# List of 3
#  $ L:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ M:List of 13
#   ..- attr(*, "class")= chr "lm"
#  $ H:List of 13
#   ..- attr(*, "class")= chr "lm"
#  - attr(*, "split_type")= chr "data.frame"
#  - attr(*, "split_labels")='data.frame':        3 obs. of  1 variable:

It gives list of three lm models.

Using ldply you could create a data.frame, e.g.

  • with predictions of each model:

    ldply(list_of_models, function(model) {
        data.frame(fit=predict(model, warpbreaks))
    })
    #     tension     fit
    # 1         L 44.5556
    # 2         L 44.5556
    # 3         L 44.5556
    
  • with statistics to each model:

    ldply(list_of_models, function(model) {
      c(
        aic = extractAIC(model),
        deviance = deviance(model),
        logLik = logLik(model),
        confint = confint(model),
        coef = coef(model)
      )
    })
    # tension aic1    aic2 deviance   logLik confint1  confint2 confint3 confint4 coef.(Intercept) coef.woolB
    # 1       L    2 98.3291  3397.78 -72.7054  34.2580 -30.89623  54.8531 -1.77044          44.5556  -16.33333
    # 2       M    2 81.1948  1311.56 -64.1383  17.6022  -4.27003  30.3978 13.82559          24.0000    4.77778
    # 3       H    2 76.9457  1035.78 -62.0137  18.8701 -13.81829  30.2411  2.26273          24.5556   -5.77778
    
0

精彩评论

暂无评论...
验证码 换一张
取 消