I have a set of measurements done regularly, but some are missing:
measurement_date value
1 2011-01-17 13:00:00 5
2 2011-01-17 13:04:00 5
3 2011-01-17 13:08:00 7
4 2011-01-17 13:12:00 8
5 2011-01-17 13:16:00 4
6 2011-01-17 13:24:00 6
7 2011-01-17 13:28:00 5
8 2011-01-17 13:32:00 6
9 2011-01-17 13:36:00 9
10 2011-01-17 13:40:00 8
11 2011-01-17 13:44:00 6
12 2011-01-17 13:48:00 6
13 2011-01-17 13:52:00 4
14 2011-01-17 13:56:00 6
I have a function that's going to process the values and can handle missing values, but the row has to be there so I'm generating an array 开发者_运维百科that has a row for every minute like this:
times <- timeSequence(from=.., length=60, by="min")
Now I have a row for each minute of the hour but I need to merge the data. I tried something like this but couldn't quite get it right:
lapply(times, function(time) {
n <- as.numeric(time)
v <- Position(function(candidate) {
y <- as.numeric(candiated)
n == y
}
.. insert the value into the row here ..
}
but I'm only getting errors and warnings. Am I going around the problem the right way? I really want a "complete" array with values per minute as there will be many different functions that will be run of the readings and it just makes it easier to implement them if they can assume that it's all there.
DF <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
as.POSIXct("2011-01-17 13:56:00"),
by = "mins")[seq(1, 57, by = 4)][-6],
value = c(5,5,7,8,4,6,5,6,9,8,6,6,4,6))
full <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
by = "mins", length = 60),
value = rep(NA, 60))
Two approaches can be used, the first via merge
:
> v1 <- merge(full, DF, by.x = 1, by.y = 1, all = TRUE)[, c(1,3)]
> names(v1)[2] <- "value" ## I only reset this to pass all.equal later
> head(v1)
measurement_date value
1 2011-01-17 13:00:00 5
2 2011-01-17 13:01:00 NA
3 2011-01-17 13:02:00 NA
4 2011-01-17 13:03:00 NA
5 2011-01-17 13:04:00 5
6 2011-01-17 13:05:00 NA
The second is via an indicator variable derived using %in%
:
> want <- full$measurement_date %in% DF$measurement_date
> full[want, "value"] <- DF[, "value"]
> head(full)
measurement_date value
1 2011-01-17 13:00:00 5
2 2011-01-17 13:01:00 NA
3 2011-01-17 13:02:00 NA
4 2011-01-17 13:03:00 NA
5 2011-01-17 13:04:00 5
6 2011-01-17 13:05:00 NA
> all.equal(v1, full)
[1] TRUE
The merge version is strongly preferred, but needs a little work. The %in%
solution only works here because the data are in time order in both DF
and full
, hence my earlier "preferred". It is easy to get/ensure the two objects in time order however, so both approaches require a little finesse-ing to work. We can modify the %in%
approach to get both variables in order (starting afresh with full
):
full2 <- data.frame(measurement_date = seq(as.POSIXct("2011-01-17 13:00:00"),
by = "mins", length = 60),
value = rep(NA, 60))
full2 <- full2[order(full2[,1]), ] ## get full2 in order
DF2 <- DF[order(DF[,1]), ] ## get DF in order
want <- full$measurement_date %in% DF$measurement_date
full2[want, "value"] <- DF2[, "value"]
> all.equal(full, full2)
[1] TRUE
> all.equal(full2, v1)
[1] TRUE
>
In your function, as.numeric(candiated) should be as.numeric(candidate). There's also a bracket missing. I have no clue what exactly you're trying to achieve in your function, but it looks horrendously complex to me.
Try
merge(Data,times,by.x=1,by.y=1,all.y=T)
This should give you something to work with.
精彩评论