开发者

Time-based averaging (sliding window) of columns in a data.frame

开发者 https://www.devze.com 2023-01-21 09:08 出处:网络
I have a data.frame which has multiple columns. One of the columns is time and is thus non-decreasing. Rest of the columns contain observations recorded at the time given by the time specified in a ce

I have a data.frame which has multiple columns. One of the columns is time and is thus non-decreasing. Rest of the columns contain observations recorded at the time given by the time specified in a certain row of the data.frame.

I want to select a window of time, say "x" seconds, and calculate the average (or for that matter any function) of the entries in some other columns in the same data.frame for that window.

Of course, because its a time based average, the number of entries in a window can vary depending upon the data. This is because the number of rows belonging to a certain time window can vary.

I have done this using a custom function, which creates a new column in the data.frame. The new column assigns a single number to all the entries in a time window. The number is unique across all the time windows. This essentially divides the data into groups based on the time windows. Then I use R's "aggregate" function to do calculate the mean.

I was just wondering if there is an existing R 开发者_如何学JAVAfunction that can do the grouping based on a time interval or if there is a better (cleaner) way to do this.


Assuming your data.frame contains only numeric data, this is one way to do it using zoo/xts:

> Data <- data.frame(Time=Sys.time()+1:20,x=rnorm(20))
> xData <- xts(Data[,-1], Data[,1])
> period.apply(xData, endpoints(xData, "seconds", 5), colMeans)
                           [,1]
2010-10-20 13:34:19 -0.20725660
2010-10-20 13:34:24 -0.01219346
2010-10-20 13:34:29 -0.70717312
2010-10-20 13:34:34  0.09338097
2010-10-20 13:34:38 -0.22330363

EDIT: using only base R packages. The means are the same, but the times are slightly different because endpoints starts the 5-second interval with the first observation. The code below groups on 5-second intervals starting with seconds = 0.

> nSeconds <- 5
> agg <- aggregate(Data[,-1], by=list(as.numeric(Data$Time) %/% nSeconds), mean)
> agg[,1] <- .POSIXct(agg[,1]*nSeconds)  # >= R-2.12.0 required for .POSIXct


zoo has a rollapply() method. If you can't use it, I have rolled my own a few times. It isn't very difficult.

0

精彩评论

暂无评论...
验证码 换一张
取 消