Fellow R users of StackOverflow-
I have to match a set of distinct times to intervals of time defined in another data frame by a start and end. This would result in one data frame with a many to one relation.
A simplified set of the records to match into follows:
vid start end
17599 7588 2011-02-14 19:00:00 2011-02-14 19:11:00
17601 7588 2011-02-14 19:58:00 2011-02-14 20:43:00
17603 7588 2011-02-14 21:22:00 2011-02-14 22:00:00
And some example records to match to the above data are:
vid datetime
469818 7588 2011-02-14 19:00:10
470747 7588 2011-02-14 19:59:10
470788 7588 2011-02-14 21:23:10
What I would like is something like:
vid datetime start end
7588 2011-02-14 19:00:10 2011-02-14 19:00:00 2011-02-14 19:11:00
7588 2011-02-14 19:59:10 2011-02-14 19:58:00 2011-02-14 20:43:00
7588 2011-02-14 21:23:10 2011-02-14 21:22:00 2011-02-14 22:00:00
For the life of me, I can't figure out how to do th开发者_如何学编程is in R. Any help would be greatly appreciated. Thank you!
Reproducible example:
txt1 <- " vid start end
17599 7588 '2011-02-14 19:00:00' '2011-02-14 19:11:00'
17601 7588 '2011-02-14 19:58:00' '2011-02-14 20:43:00'
17603 7588 '2011-02-14 21:22:00' '2011-02-14 22:00:00'
"
txt2 <- " vid datetime
469818 7588 '2011-02-14 19:00:10'
470747 7588 '2011-02-14 19:59:10'
470788 7588 '2011-02-14 21:23:10'
"
d1 <- read.table(textConnection(txt1), header = TRUE,
colClasses = c("integer","integer","POSIXct","POSIXct"))
d2 <- read.table(textConnection(txt2), header = TRUE,
colClasses = c("integer","integer","POSIXct"))
We can get the indexes (rows) in d1
that correspond to each row of d2
using:
> idx <- sapply(d2$datetime,
+ function(x, start, end) {which(x > start & x < end)},
+ d1$start, d1$end)
> idx
[1] 1 2 3
And we can use the indexes idx
to bind elements of d1
on to d2
:
> cbind(d2, d1[idx, 2:3])
vid datetime start end
469818 7588 2011-02-14 19:00:10 2011-02-14 19:00:00 2011-02-14 19:11:00
470747 7588 2011-02-14 19:59:10 2011-02-14 19:58:00 2011-02-14 20:43:00
470788 7588 2011-02-14 21:23:10 2011-02-14 21:22:00 2011-02-14 22:00:00
I did, however solve the problem, using >= and <= operators to compare the start, end and datetime fields, after splitting the data using split(). ;match.col = with(d2[[v]], d1$datetime >= d2.start & d1$datetime <= d2.end)
精彩评论