I have a large list of non-unique named values, i.e.:
tscores
11461 11461 11461 11461 14433
-1.966196e+01 7.808853e-01 2.065178e+01 5.630565e+00 -7.295436e+00
14433 14433 14433 14433 14433
2.036339e+00 -6.704906e+00 1.603803e+00 -1.118324e+01 1.450554e+00
14102 16153 16189 18563 18563
-1.137429e+01 7.053336e-02 1.011208e+00 -7.811194e+00 -6.749376e-01
18563 18563 22042 22042 22042
7.480217e-01 -9.909211e-01 -9.577424e-0开发者_JS百科1 -7.887699e-02 -4.867706e-01
I'd like to be able to pull out a subvector of all values that correspond to a name more efficiently. At the moment, I'm using:
u_tscores <- sapply(unique(names(tscores)), function(name, scores) {mean(scores[names(scores)==name])}, scores=tscores)
Which is far too slow for what I need. I know there has to be an easier way to get all values with the same name.
The best bet you have is using lapply
on the list obtained by split(tscores,names(tscores))
. Wins you about a fivefold in speed :
n <- 1000000
tscores <- runif(n)
names(tscores) <- sample(letters,n,replace=T)
system.time(
X <- tapply(tscores, names(tscores), mean)
)
user system elapsed
0.89 0.00 0.89
system.time(
X2 <- sapply(unique(names(tscores)), function(name, scores){
mean(scores[names(scores)==name])}, scores=tscores)
)
user system elapsed
0.73 0.05 0.78
system.time(
X3 <- unlist(lapply(split(tscores,names(tscores)),mean))
)
user system elapsed
0.11 0.02 0.13
EDIT :
system.time(X3 <- sapply(split(tscores,names(tscores)),mean))
user system elapsed
0.14 0.00 0.14
try this:
tapply(tscores, names(tscores), mean)
I'm note sure if this is more efficient, but probably not less efficient...
Hey there, it seems you will be subsetting this multiple times (that is, you won't be selecting from many elements of this type just once each). Your data formatting doesn't quite seem geared towards this purpose. So list the values by name
tvalues <- sapply(unique(names(tscores)), function(x, tscores) as.numeric(tscores[names(tscores) == x])), tscores=tscores)
That should give you a list of unique-tscore-name-named tscore value numeric vectors. Then, just tvalues$name
whenever you need to select a name's values. That should knock an order or so off your complexity. Apologies for errors and false assumptions.
精彩评论