开发者

Surprisingly Slow Standard Deviation in R

开发者 https://www.devze.com 2023-04-05 18:18 出处:网络
I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation.This seems like a fairly straightforward thing to do that should be relatively fast

I am calculating standard deviations on an expanding window where at each point I recalculate the standard deviation. This seems like a fairly straightforward thing to do that should be relatively fast. However, it takes a lot longer than you might think (~45 seconds). Am I missing something here? In Matlab this is quite fast.

t0 <- proc.time()[[3]]
z <- rep(0, 7000)
x <- rnorm(8000)
for(i in 1000:8000){
##    print(i)
    z[i] <- sd(x[1开发者_JAVA技巧:i])
}
print(proc.time()[[3]]- t0)


You might also try an algorithm that updates the standard deviation (well, actually, the sum of squares of differences from the mean) as you go. On my system this reduces the time from ~0.8 seconds to ~0.002 seconds.

n <- length(x)
m <- cumsum(x)/(1:n)
m1 <- c(NA,m[1:(n-1)])
ssd <- (x-m)*(x-m1)
v <- c(0,cumsum(ssd[-1])/(1:(n-1)))
z <- sqrt(v)

See http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance for details.

Also see the answers to this question: Efficient calculation of matrix cumulative standard deviation in r


Edited to fix some typos, sorry.

This takes ~1.3 seconds on my machine:

t0 <- proc.time()[[3]]
x <- rnorm(8000)
z <- sapply(1000:8000,function(y){sd(x[seq_len(y)])})
print(proc.time()[[3]]- t0)

and I'd be willing to bet there are even faster ways of doing this. Avoid explicit for loops!


When a somewhat similar question about a cumulative variance and a cumularive kurtosis operation came up in rhelp a few days ago, here is what I offered :

daily <- rnorm(1000000)
mbar <- mean(daily)
cumvar <-  cumsum( (daily-cumsum(daily)/1:length(daily) )^2)
cumskew <- cumsum( (daily-cumsum(daily)/1:length(daily))^3)/cumvar^(3/2)

It's certainly faster than the sapply method but may be comparable to Aaron's.

 system.time( cumvar <-  cumsum( (daily-cumsum(daily)/1:length(daily) )^2) )
   user  system elapsed 
  0.037   0.026   0.061 
 system.time(cumsd <- sqrt(cumvar) )
   user  system elapsed 
  0.009   0.005   0.013 
0

精彩评论

暂无评论...
验证码 换一张
取 消