from for loop to apply_问答_开发者_运维开发者技术经验分享

I am new in using R. So I am not sure about how to use apply. I would like to speed up my function with using apply:

for(i in 1: ncol(exp)){
 for (j in 1: length(fe)){
  tmp =TRUE
  id = strsplit(colnames(exp)[i],"\\.")
  if(id == fe[j]){
   tmp = FALSE
  }
  if(tmp ==TRUE){
   only = cbind(only,c(names(exp)[i],exp[,i]) )
  }
 }
}

How can I use the apply function to do this above?

EDIT :

Thank you so much for the ve开发者_如何学Cry good explanation and sorry for my bad description. You guess everything right, but When wanted to delete matches in fe.

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)

fe<-LETTERS[1:2]

then the result should be only colnames with 'C'. Everything else should be deleted.

EDIT : If you only want to delete the columns whose name appear in fe, you can simply do :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe<-LETTERS[1:2]

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)!i[1] %in% fe)
Exp[id]

This code does exactly what your (updated) for-loop does as well, only a lot more efficient. You don't have to loop through fe, the %in% function is vectorized.

In case the name can appear anywhere between the dots, then

id <- sapply(strsplit(names(Exp),"\\."),
    function(i)sum(i %in% fe)==0)

Your code does some very funny things, and I have no clue what exactly you're trying to do. For one, strsplit gives a list, so id == fe[j] will always return false, unless fe[j] is a list itself. And I doubt it is... So I'd correct your code as

id = strsplit(colnames(Exp)[i],"\\.")[[1]][1]

in case you want to compare with everything that is before the dot, or to

id = unlist(strsplit(colnames(Exp)[i],"\\."))

if you want to compare with everything in the string. In that case, you should use %in%instead of == as well.

Second, what you get is a character matrix, which essentially multiplies rows. if all elements in fe[j] are unique, you could as well do :

only <- rbind(names(exp),exp)
only <- do.call(cbind,lapply(mat,function(x) 
       matrix(rep(x,ncol(exp)-1),nrow=nrow(exp)+1)
))

Assuming that the logic in your code does make sense (as you didn't apply some sample data this is impossible to know), the optimalization runs :

mat <- rbind(names(Exp),Exp)

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% strsplit(x[1],"\\.")[[1]][1])
        matrix(rep(x,n),nrow=nrow(mat))
}))

Note that - in case you are interested if fe[j] appears anywhere in the name - you can change the code to :

do.call(cbind,
    lapply(mat, function(x){
        n <- sum(!fe %in% unlist(strsplit(x[1],"\\.")))
        matrix(rep(x,n),nrow=nrow(mat))
}))

If this doesn't return what you want, then your code doesn't do that either. I checked with following sample data, and all gives the same result :

Exp <- data.frame(A.x=1:10,B.y=10:1,C.z=11:20,A.z=20:11)
fe <- LETTERS[1:4]

The apply() family of functions are convenience functions. They will not necessarily be faster than a well-written for loop or vectorized functions. For example:

set.seed(21)
x <- matrix(rnorm(1e6),5e5,2)

system.time({
  yLoop <- x[,1]*0  # preallocate result
  for(i in 1:NROW(yLoop)) yLoop[i] <- mean(x[i,])
})
#    user  system elapsed 
#   13.39    0.00   13.39 
system.time(yApply <- apply(x, 1, mean))
#    user  system elapsed 
#   16.19    0.28   16.51
system.time(yRowMean <- rowMeans(x))
#    user  system elapsed 
#    0.02    0.00    0.02
identical(yLoop,yApply,yRowMean)
# TRUE

The reason your code is so slow is that--as Gavin pointed out--you're growing your array for every loop iteration. Preallocate the entire array before the loop and you will see a significant speedup.