Rep values from a data frame to another data frame. apply? sapply?_问答_开发者

Rep values from a data frame to another data frame. apply? sapply?

开发者 https://www.devze.com 2023-03-31 07:56 出处：网络

I have the following data frame data<-data.frame(ID=c(\"a\", \"b\", \"c\", \"d\"), zeros=c(3,2,5,4), ones=c(1,1,2,1))

相关专题：dataframe

I have the following data frame

data<-data.frame(ID=c("a", "b", "c", "d"), zeros=c(3,2,5,4), ones=c(1,1,2,1))


   ID zeros ones
1  a     3    1
2  b     2    1
3  c     5    2
4  d     4    1

and I wish to create another data frame with 2 columns:

First colu开发者_如何学编程mn(id) the ID is repeated (zero+ones) times Second column value should be the c(rep(0, zeros), rep(1, ones))

so that the result would be

I tried data.frame(id=(rep(data$ID, (data$zeros+data$ones))), value=c(rep(0, data$zeros), rep(1, data$ones))) but doesnt work. Any ideas? Thank you in advance

This is perhaps overkill, using ddply from the plyr package, but it's the first thing that came to me:

ddply(dat,.(ID),function(x){data.frame(value = rep(c(0,1),times = c(x$zeros,x$ones)))})

Oh and I changed the name of your data frame to dat to avoid a bad habit (data is the name of an oft used function).

Here's a base R solution. I prefer the overkill of plyr myself:

dat <- data.frame(ID = letters[1:4], zeros = c(3,2,5,4), ones = c(1,1,2,1))

do.call("rbind"
    , apply(dat, 1, function(x) 
        data.frame(cbind(id = x[1], value = rep(0:1, times = x[2:3])))
    )
)

Since you've already got a base R solution for the first column, this is one for your second column:

lengths<-as.vector(t(as.matrix(data[,2:3]))) #notice the t
what<-rep(c(0,1), nrow(data))
times<-rep(what, lengths)

Edit: changed a minor thing above and tested it. It works now.

I also prefer the plyr method, but I thought I'd throw another base R solution related to reshaping the data first, and then replicating it. (also using dat instead of data):

names(dat)[2:3] <- c("times.0", "times.1")
tmp <- reshape(dat, varying=2:3, direction="long")
tmp <- tmp[rep(seq(length=nrow(tmp)),tmp$times),c("ID","time")]
names(tmp) <- c("id","value")
tmp <- tmp[order(tmp$id, tmp$value),]
rownames(tmp) <- NULL

Not as elegant as some of the other base solutions because it requires intermediate storage, but possibly interesting.