I have the following data frame
data<-data.frame(ID=c("a", "b", "c", "d"), zeros=c(3,2,5,4), ones=c(1,1,2,1))
ID zeros ones
1 a 3 1
2 b 2 1
3 c 5 2
4 d 4 1
and I wish to create another data frame with 2 columns:
First colu开发者_如何学编程mn(id) the ID is repeated (zero+ones) times Second column value should be the c(rep(0, zeros), rep(1, ones))
so that the result would be
id value
1 a 0
2 a 0
3 a 0
4 a 1
5 b 0
6 b 0
7 b 1
8 c 0
9 c 0
10 c 0
11 c 0
12 c 0
13 c 1
14 c 1
15 d 0
16 d 0
17 d 0
18 d 0
19 d 1
I tried data.frame(id=(rep(data$ID, (data$zeros+data$ones))), value=c(rep(0, data$zeros), rep(1, data$ones)))
but doesnt work. Any ideas? Thank you in advance
This is perhaps overkill, using ddply
from the plyr
package, but it's the first thing that came to me:
ddply(dat,.(ID),function(x){data.frame(value = rep(c(0,1),times = c(x$zeros,x$ones)))})
Oh and I changed the name of your data frame to dat
to avoid a bad habit (data
is the name of an oft used function).
Here's a base R solution. I prefer the overkill of plyr
myself:
dat <- data.frame(ID = letters[1:4], zeros = c(3,2,5,4), ones = c(1,1,2,1))
do.call("rbind"
, apply(dat, 1, function(x)
data.frame(cbind(id = x[1], value = rep(0:1, times = x[2:3])))
)
)
Since you've already got a base R solution for the first column, this is one for your second column:
lengths<-as.vector(t(as.matrix(data[,2:3]))) #notice the t
what<-rep(c(0,1), nrow(data))
times<-rep(what, lengths)
Edit: changed a minor thing above and tested it. It works now.
I also prefer the plyr
method, but I thought I'd throw another base R solution related to reshaping the data first, and then replicating it. (also using dat
instead of data
):
names(dat)[2:3] <- c("times.0", "times.1")
tmp <- reshape(dat, varying=2:3, direction="long")
tmp <- tmp[rep(seq(length=nrow(tmp)),tmp$times),c("ID","time")]
names(tmp) <- c("id","value")
tmp <- tmp[order(tmp$id, tmp$value),]
rownames(tmp) <- NULL
Not as elegant as some of the other base solutions because it requires intermediate storage, but possibly interesting.
精彩评论