Transform a data.frame, while filling missing values_问答_开发者

Transform a data.frame, while filling missing values

开发者 https://www.devze.com 2023-04-08 01:26 出处：网络

I have the data frame data<-data.frame(id=开发者_开发问答c(\"A\",\"A\",\"B\",\"B\"), day=c(5,6,1,2), duration=c(12,1440,5,6), obs.period=c(60, 60,100,100))

相关专题：

I have the data frame

data<-data.frame(id=开发者_开发问答c("A","A","B","B"), day=c(5,6,1,2), duration=c(12,1440,5,6), obs.period=c(60, 60,100,100))

showing Subject ID, day of event, duration of event, and observation period of Subject

I want to transform the data set to that it will show the whole observation period for each subject (all days of observation), while adding zero as duration values for the days where no event was observed

For the above dataset this would be something like this:

id  day duration    obs.period
A   1   0   60
A   2   0   60
A   3   0   60
A   4   0   60
A   5   12  60
A   6   1440    60
A   7   0   60
A   8   0   60
    .       
    .       
    .       
A   60  0   60
B   1   5   100
B   2   6   100
B   3   0   100
B   4   0   100
    .       
    .       
    .       
    .       
B   100 0   100

Any ideas?

Here's one approach using the plyr package. First, create a function to expand the data into the appropriate number of rows. Then, index into that new data.frame with the duration info from the original data. Finally, call this function with ddply() and group on the id variable.

require(plyr)
FUN <- function(x){
  dat <- data.frame(
    id = x[1,1]
    , day = seq_len(x[1,4])
    , duration = 0
    , obs.period = x[1,4]
    )

  dat[dat$id == x$id & dat$day == x$day, "duration"] <- x$duration
  return(dat)
}


ddply(data, "id", FUN)

    id day duration obs.period
1    A   1        0         60
2    A   2        0         60
3    A   3        0         60
4    A   4        0         60
5    A   5       12         60
6    A   6     1440         60
...
61   B   1        5        100
62   B   2        6        100
63   B   3        0        100
...
160  B 100        0        100

Create an empty data frame with the proper index columns, but no value columns, then merge it with your data and replace the NA's in the value columns with zeros.

data<-data.frame(id=c("A","A","B","B"), day=c(5,6,1,2), duration=c(12,1440,5,6), obs.period=c(60, 60,100,100))
zilch=data.frame(id=rep(c("A","B"),each=60),day=1:60)
all=merge(zilch,data, all=T)
all[is.na(all$duration),"duration"]<-0
all[is.na(all$obs.period),"obs.period"]<-0

I would first create a data frame to contain the results.

ob.period <- with(data, tapply(obs.period, id, max))

n <- sum(ob.period)
result <- data.frame(id=rep(names(ob.period), ob.period),
                     day=unlist(lapply(ob.period, function(a) 1:a)),
                     duration=rep(0, n),
                     obs.period=rep(ob.period,ob.period))

Then I would paste id and day together, use match to find the relevant rows in the larger data frame, and plug in the duration values.

idday.sm <- paste(data$id, data$day, sep=":")
idday.lg <- paste(result$id, result$day, sep=":")

result$duration[match(idday.sm, idday.lg)] <- data$duration

Here is an approach with plyr

fill1 <- function(df) {
  full_period <- 1:100
  to_fill <- setdiff(full_period, df$day)
  fill_id <- df[1,"id"]
  fill_dur <- 0
  fill_obs.p <- df[1,"obs.period"]
  rows_to_add <- data.frame(id=fill_id, day=to_fill, duration=fill_dur, obs.period=fill_obs.p)
  rbind(df,rows_to_add)
}
ddply(data, "id", fill1)

The result is not sorted by id, duration, however.