Is there any way to bind data to data.frame by some index?_问答_开发者

Is there any way to bind data to data.frame by some index?

开发者 https://www.devze.com 2023-02-13 16:30 出处：网络

#For say, I got a situation like this user_id = c(1:5,1:5) time = c(1:10) visit_log = data.frame(user_id, time)

相关专题：r

#For say, I got a situation like this
user_id = c(1:5,1:5)
time = c(1:10)
visit_log = data.frame(user_id, time)

#And I've wrote a method to calculate interval
interval <- function(data) {
    interval = c(Inf)
    for (i in seq(1, length(data$time))) {
        intv = data$time[i]-data$time[i-1]
        interval = append(interval, intv)
    }

    data$interval = interval
    return (data)
}

#But when I want to get int开发者_开发技巧ervals by user_id and bind them to the data.frame,
#I can't find a proper way
#Is there any method to get something like
new_data = merge(by(visit_log, INDICE=visit_log$user_id, FUN=interval))

#And the result should be
    user_id time interval
1        1    1      Inf
2        2    2      Inf
3        3    3      Inf
4        4    4      Inf
5        5    5      Inf
6        1    6        5
7        2    7        5
8        3    8        5
9        4    9        5
10       5   10        5

We can replace your loop with the diff() function which computes the differences between adjacent indices in a vector, for example:

> diff(c(1,3,6,10))
[1] 2 3 4

To that we can prepend Inf to the differences via c(Inf, diff(x)).

The next thing we need is to apply the above to each user_id individually. For that there are many options, but here I use aggregate(). Confusingly, this function returns a data frame with a time component that is itself a matrix. We need to convert that matrix to a vector, relying upon the fact that in R, columns of matrices are filled first. Finally, we add and interval column to the input data as per your original version of the function.

interval <- function(x) {
    diffs <- aggregate(time ~ user_id, data = x, function(y) c(Inf, diff(y)))
    diffs <- as.numeric(diffs$time)
    x <- within(x, interval <- diffs)
    x
}

Here is a slightly expanded example, with 3 time points per user, to illustrate the above function:

> visit_log = data.frame(user_id = rep(1:5, 3), time  = 1:15)
> interval(visit_log)
   user_id time interval
1        1    1      Inf
2        2    2      Inf
3        3    3      Inf
4        4    4      Inf
5        5    5      Inf
6        1    6        5
7        2    7        5
8        3    8        5
9        4    9        5
10       5   10        5
11       1   11        5
12       2   12        5
13       3   13        5
14       4   14        5
15       5   15        5