How does ddply handle factors as "split" variables?_问答_开发者

How does ddply handle factors as "split" variables?

开发者 https://www.devze.com 2023-02-19 19:21 出处：网络

I have a data.frame with 20 columns. The first two are factors, and the rest are numeric. I\'d like to use the first two columns as split variables and then apply the mean() to the remaining columns.

相关专题：plyr r

I have a data.frame with 20 columns. The first two are factors, and the rest are numeric. I'd like to use the first two columns as split variables and then apply the mean() to the remaining columns.

This seems like a quick and easy job for ddply(), however, the results for the output data.frame are not开发者_StackOverflow社区 what I am looking for. Here is a minimal example with just one column of data:

Aa <- c(rep(c("A", "a"), each = 20))
Bb <- c(rep(c("B", "b", "B", "b"), each = 10))
x <- runif(40)
df1 <- data.frame(Aa, Bb, x)

ddply(df1, .(Aa, Bb), mean)

The output is:

  Aa Bb         x
1 NA NA 0.5193275
2 NA NA 0.4491907
3 NA NA 0.4848128
4 NA NA 0.4717899
Warning messages:
1: In mean.default(X[[1L]], ...) :
  argument is not numeric or logical: returning NA

The warning is repeated 8 times, presumably once for each call to mean(). I'm guessing this comes from trying to take the mean of a factor. I could write this as:

ddply(df1, .(Aa, Bb), function(df1) mean(df1$x))

ddply(df1, .(Aa, Bb), summarize, x = mean(x))

both of which do work (not giving NAs), but I would rather avoid writing out 18 such x = mean(x) statements, one for each of my numeric columns.

Is there a general solution? I'm not wedded to ddply if there is a better answer elsewhere.

Since you are reducing hte number of rows, you need to use summarise:

> ddply(df1, .(Aa, Bb), summarise, mean_x =mean(x) )
  Aa Bb    mean_x
1  a  b 0.3790675
2  a  B 0.4242922
3  A  b 0.5622329
4  A  B 0.4574471

It's just as easy to use aggregate in this instance. Let's say you had two variables:

> aggregate(df1[-(1:2)], df1[1:2], mean)
  Aa Bb         x         y
1  a  b 0.4249121 0.4639192
2  A  b 0.6127175 0.4639192
3  a  B 0.4522292 0.4826715
4  A  B 0.5201965 0.4826715

ddply supports negative indexing as well:

ddply(df1, .(Aa, Bb), function(x) mean(x[-(1:2)]))

How does ddply handle factors as "split" variables?

精彩评论

关注公众号

热门标签

图文推荐

How does ddply handle factors as "split" variables?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：