I would like to calculate the number of periods that have elapsed since the 200 period high of a univariate time series. For example, here's the closing price of SPY:
require(quantmod)
getSymbols("SPY",from='01-01-1900')
Data <- Cl(SPY)
Now, I can find the 200-period highs of this series using the 开发者_JAVA百科Lag
function in quantmod:
periodHigh <- function(x,n) {
Lags <- Lag(x,1:n)
High <- x == apply(Lags,1,max)
x[High]
}
periodHigh(Data, 200)
But now I'm stuck. How do I merge this back onto the original series (Data
) and calculate, for each point in the series, how many periods have elapsed since the previous n-period high?
This little function returns a list with:
high
the index number of high datesrecentHigh
the index number of the most recent high daydaysSince
the number of days since the last highdata
an xts object with only the high days. Useful for plotting.
The code:
daysSinceHigh <- function(data, days){
highs <- days-1+which(apply(embed(data, days), 1, which.max)==1)
recentHigh <- max(highs)
daysSince <- nrow(data) - recentHigh
list(
highs=highs,
recentHigh = recentHigh,
daysSince = daysSince,
data=data[highs, ])
}
The results:
daysSinceHigh(Data, 200)$daysSince
[1] 90
plot(Data)
points(daysSinceHigh(Data, 200)$data, col="red")
The answer to your revised question:
require(zoo)
x <- sample(300:500, 1000, replace=TRUE)
str(rollapply(x, 200, function(x) which.max(x)))
# int [1:801] 14 13 12 11 10 9 8 7 6 5 ...
plot(x)
plot(200:1000, rollapply(x, 200, function(x) 200-which.max(x)))
So for the xts series:
plot( rollapply(coredata(Data), 200, function(x) 200-which.max(x)))
You obviously cannot merge anything back to the first 200 dates unless you apply a looser definition of rolling maximum. (In another SO session involving "shifty" data I showed how to use embed to pad the "trailing" periods: Data transformation in R but I don't know if you want to construct matrices that are 200 times as large as your input data.)I edited the code from the previous answers such that they are functions that take the same inputs (a univariate time series) and return the same output (a vector of days since the last n-day high):
daysSinceHigh1 <- function(x,n) {
as.vector(n-rollapply(x, n, which.max))
}
daysSinceHigh2 <- function(x, n){
apply(embed(x, n), 1, which.max)-1
}
The second function seems to be the fastest, but they're providing slightly different results:
> getSymbols("^GSPC",from='01-01-1900')
[1] "GSPC"
> system.time(x <- daysSinceHigh1(Cl(GSPC), 200))
user system elapsed
0.42 0.00 0.42
> system.time(y <- daysSinceHigh2(Cl(GSPC), 200))
user system elapsed
0.24 0.00 0.24
> all.equal(x,y)
[1] "Mean relative difference: 0.005025126"
Upon closer inspection, it appears that there are some weird edge cases in the 1st function:
data <- c(1,2,3,4,5,6,7,7,6,5,6,7,8,5,4,3,2,1)
answer <- c(0,0,0,0,1,2,3,0,0,1,2,3,4,4)
x <- daysSinceHigh1(data, 5)
y <- daysSinceHigh2(data, 5)
> x
[1] 0 0 0 1 2 3 4 4 0 1 2 3 4 4
> y
[1] 0 0 0 0 1 2 3 0 0 1 2 3 4 4
> answer
[1] 0 0 0 0 1 2 3 0 0 1 2 3 4 4
> all.equal(x,answer)
[1] "Mean relative difference: 0.5714286"
> all.equal(y,answer)
[1] TRUE
Therefore, it seems like the second function (based off Andrie's code) is better.
精彩评论