开发者

Run time - using apply functions

开发者 https://www.devze.com 2023-01-15 09:41 出处:网络
I have two apply functions excecuting the average and standard deviation across the first two dimensions on a large three dimentional array (437216,8,3). It takes 16 minutes to complete on Rx32. It\'s

I have two apply functions excecuting the average and standard deviation across the first two dimensions on a large three dimentional array (437216,8,3). It takes 16 minutes to complete on Rx32. It's the first of many large arrays in a database we are 开发者_开发百科applying this script on a regular basis. Any thoughts on how to speed up runtime?


That seems very slow. On my machine

set.seed(10)

x = array(rnorm(437216*8*3), dim = c(437216,8,3))

system.time(apply(x, 1, mean))

takes

   user  system elapsed 
 23.903   0.263  24.522 

FWIW,

system.time(apply(x, 2, mean))
       user  system elapsed 
      0.546   0.274   0.841 


system.time(apply(x, 3, mean))
   user  system elapsed 
  0.516   0.267   0.790 

What is your sessionInfo()?

sessionInfo()
R version 2.11.1 (2010-05-31) 
i386-apple-darwin9.8.0 

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] cimis_0.1-3    RLastFM_0.1-4  RCurl_1.4-2    bitops_1.0-4.1 XML_3.1-0      lattice_0.18-8

loaded via a namespace (and not attached):
[1] grid_2.11.1  tools_2.11.1


My systemInfo() is as follows:

sessionInfo() R version 2.11.0 (2010-04-22) x86_64-pc-mingw32

locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252

attached base packages: [1] stats     graphics  grDevices utils     datasets methods   base

other attached packages: [1] abind_1.1-0   RSQLite_0.9-1 DBI_0.2-5

The apply function is applied across both the first and second margin (1:2) and the system time is below, which I believe is what is causing it run so long. I ran it on a better computer/system (listed above) and cut the run time some (below), but it still seems like it's taking longer than it should:

>  system.time(apply(x,1:2,mean))   
user  system elapsed
311.56    0.30  311.88
> system.time(apply(x,1:2,sd))    
user  system elapsed
505.92    0.21  506.81

I'll look into converting it to a data.frame and unlisting it as in the second suggestion. Thanks for all the help!


EDIT : After the code provided by OP, the problem became clear. Trick is to convert it to a dataframe :

> x = array(rnorm(437216*8*3), dim = c(437216,8,3))

> system.time(apply(x,1:2,mean))
   user  system elapsed 
 107.06    0.18  107.34 
 # This is run on a new quadcore i7, so it's not a slow machine...

> Tmp <- data.frame(V1=as.vector(x[,,1]),
+             V2=as.vector(x[,,2]),
+             V3= as.vector(x[,,3]))

> system.time({
+     Means <- rowMeans(Tmp)
+     Sd <- sqrt(rowSums((Tmp-Means)^2)/(3-1))
+ })
   user  system elapsed 
   6.72    0.40    7.12 

To get the results in the correct matrix :

Means <- matrix(Means,ncol=8)
Sd <- matrix(Sd,ncol=8)

Proof of concept :

x = array(rnorm(10*8*3), dim = c(10,8,3))

m1 <- apply(x,1:2,mean)
sd1 <- apply(x,1:2,sd)

Tmp <- data.frame(V1=as.vector(x[,,1]),
            V2=as.vector(x[,,2]),
            V3= as.vector(x[,,3]))
m2 <- rowMeans(Tmp)

sd2 <- sqrt(rowSums((Tmp-m2)^2)/2)

m2 <-matrix(m2,ncol=8)
sd2 <- matrix(sd2,ncol=8)

> all.equal(m1,m2)
[1] TRUE

> all.equal(sd1,sd2)
[1] TRUE
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号