开发者

Mean of all vector values with unique names

开发者 https://www.devze.com 2023-02-23 02:42 出处:网络
I have a large list of non-unique named values, i.e.: tscores 1146111461114611146114433 -1.966196e+017.808853e-012.065178e+015.630565e+00 -7.295436e+00

I have a large list of non-unique named values, i.e.:

tscores
        11461         11461         11461         11461         14433
-1.966196e+01  7.808853e-01  2.065178e+01  5.630565e+00 -7.295436e+00
        14433         14433         14433         14433         14433
 2.036339e+00 -6.704906e+00  1.603803e+00 -1.118324e+01  1.450554e+00
        14102         16153         16189         18563         18563
-1.137429e+01  7.053336e-02  1.011208e+00 -7.811194e+00 -6.749376e-01
        18563         18563         22042         22042         22042
 7.480217e-01 -9.909211e-01 -9.577424e-0开发者_JS百科1 -7.887699e-02 -4.867706e-01

I'd like to be able to pull out a subvector of all values that correspond to a name more efficiently. At the moment, I'm using:

u_tscores <- sapply(unique(names(tscores)), function(name, scores) {mean(scores[names(scores)==name])}, scores=tscores)

Which is far too slow for what I need. I know there has to be an easier way to get all values with the same name.


The best bet you have is using lapply on the list obtained by split(tscores,names(tscores)). Wins you about a fivefold in speed :

n <- 1000000
tscores <- runif(n)
names(tscores) <- sample(letters,n,replace=T)

system.time(
   X <- tapply(tscores, names(tscores), mean)
)
   user  system elapsed 
   0.89    0.00    0.89 

 system.time(
   X2 <- sapply(unique(names(tscores)), function(name, scores){   
            mean(scores[names(scores)==name])}, scores=tscores)
)
   user  system elapsed 
   0.73    0.05    0.78 

system.time(
  X3 <- unlist(lapply(split(tscores,names(tscores)),mean))
)
   user  system elapsed 
   0.11    0.02    0.13 

EDIT :

system.time(X3 <- sapply(split(tscores,names(tscores)),mean))
   user  system elapsed 
   0.14    0.00    0.14 


try this:

tapply(tscores, names(tscores), mean)

I'm note sure if this is more efficient, but probably not less efficient...


Hey there, it seems you will be subsetting this multiple times (that is, you won't be selecting from many elements of this type just once each). Your data formatting doesn't quite seem geared towards this purpose. So list the values by name

tvalues <- sapply(unique(names(tscores)), function(x, tscores) as.numeric(tscores[names(tscores) == x])), tscores=tscores)

That should give you a list of unique-tscore-name-named tscore value numeric vectors. Then, just tvalues$name whenever you need to select a name's values. That should knock an order or so off your complexity. Apologies for errors and false assumptions.

0

精彩评论

暂无评论...
验证码 换一张
取 消