I am using R for some statistical analysis of time series. I have tried Googling around, but I can't seem to find any definitive answers. Can any one who knows more please point me in the right direction?
Example:
Let's say I want to do a linear regression of two time series. The time series contain daily data, but there might be gaps here and there so the time series are not regular. Naturally I only want to compare data points where both time series have data. This is what I do currently to read the csv files into a data frame:
library(zoo)
apples <- read.csv('/Data/apples.csv', as.is开发者_JAVA技巧=TRUE)
oranges <- read.csv('/Data/oranges.csv', as.is=TRUE)
apples$date <- as.Date(apples$date, "%d/%m/%Y")
oranges$date <- as.Date(oranges$date, "%d/%m/%Y")
zapples <- zoo(apples$close,apples$date)
zoranges <- zoo(oranges$close,oranges$date)
zdata <- merge(zapples, zoranges, all=FALSE)
data <- as.data.frame(zdata)
Is there a slicker way of doing this?
Also, how can I slice the data, e.g., select the entries in data
with dates within a certain period?
Try something along these lines. This assumes that the dates are in column 1. The dyn package can be used to transform lm
, glm
and many similar regression type functions to ones that accept zoo series. Write dyn$lm
in place of lm
as shown:
library(dyn) # also loads zoo
fmt <- "%d/%m/%Y"
zapples <- read.zoo('apples.csv', header = TRUE, sep = ",", format = fmt)
zoranges <- read.zoo('oranges.csv', header = TRUE, sep = ",", format = fmt)
zdata <- merge(zapples, zoranges)
dyn$lm(..whatever.., zdata)
You don't need all = FALSE
since lm
will ignore rows with NAs under the default setting of its na.action
argument.
The window.zoo
function can be used to slice data.
Depending on what you want to do you might also want to look at the xts and quantmod packages.
Why did you convert both data frames to zoo
then merge and convert back to data frame? If you want a data frame, just run this line after your read.csv()
.
data <- merge(apples, oranges, by = "date")
And here's how to subset.
subset(data, date < slicemax & date > slicemin)
精彩评论