开发者

In R, how to use "aggregate" or "by" when not all combinations of factors are present?

开发者 https://www.devze.com 2023-04-11 05:03 出处:网络
Here is a small example to illustrate my data: > df <- data.frame(subgroup=rep(paste(\"s\",1:3, sep=\"\"), times=3),

Here is a small example to illustrate my data:

> df <- data.frame(subgroup=rep(paste("s",1:3, sep=""), times=3),
                   feature=c(rep("a",6), rep("b",3)),
                   var=rep(1:3, each=3),
                   data=c(rnorm(3,1), rnorm(3,2), rnorm(3,0)))
> df
  subgroup feature var        data
1       s1 开发者_开发问答      a   1  1.53152620
2       s2       a   1  1.25476445
3       s3       a   1  1.04221040
4       s1       a   2  1.68913400
5       s2       a   2  1.48290273
6       s3       a   2  1.62871854
7       s1       b   3  0.05278296
8       s2       b   3 -0.66623654
9       s3       b   3 -1.40006454

I want to examine the sum of the "data" column for each combination of feature-var that are present in my dataset. More precisely, I want to obtain TRUE when the sum is bigger than 3, and FALSE otherwise:

> result
  feature snp   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

I tried using "aggregate" or "by", but can't make them fit my need. Any idea? Thanks in advance.


One approach is to use plyr's function ddply to group on feature and var. You can use the summarize function to create a new data.frame with a column that corresponds to the rule you developed.

library(plyr)
ddply(df, c("feature", "var"), summarize, res = ifelse(sum(data) > 3,TRUE, FALSE))

Results in:

  feature var   res
1       a   1  TRUE
2       a   2  TRUE
3       b   3 FALSE

Another alternative is to use data.table which is supposed to provide some performance benefits:

library(data.table)
dt <- data.table(df)

dt[, ifelse(sum(data) > 3, TRUE, FALSE), by = c("feature", "var")]

     feature var    V1
[1,]       a   1  TRUE
[2,]       a   2  TRUE
[3,]       b   3 FALSE
0

精彩评论

暂无评论...
验证码 换一张
取 消