In the code below, instead of starting with p=0.01 and then incrementing it, I'd like to be able to do something like p=(1:99)/100 and then just loop through p. For example, instead of p=0.01, let's let p=(1:99)/100. Now, I try to replace 1:99 in my for loop with p. When I run the code, however, I start having problems with coverage[i] (it returns numeric(0)). It seems like this should be fairly trivial so I'm hoping I'm just overlooking something in my code.
Also, if you see any easy boosts in efficiency please feel free to chime in! Thanks =)
w=seq(.01,.99,by=.01)
开发者_StackOverflowcoverage=seq(1:99)
p=0.01
for (i in 1:99){
count = 0
for (j in 1:1000){
x = rbinom(30,1,p)
se=sqrt(sum(x)/30*(1-sum(x)/30)/30)
if( sum(x)/30-1.644854*se < p && p < sum(x)/30+1.644854*se )
count = count + 1
}
coverage[i]=count/1000
print(coverage[i])
p=p+0.01
}
I would work on the middle part rather than that outer loop.
coverage <- p <- 1:99/100
z <- qnorm(0.95)
for (i in seq(along=p) ){
# simulate all of the binomials/30 at once
x <- rbinom(1000, 30, p[i])/30
# ses
se <- sqrt(x * (1-x)/30)
# lower and upper limits
lo <- x - z*se
hi <- x + z*se
# coverage
coverage[i] <- mean(p[i] > lo & p[i] < hi)
}
This is almost instantaneous for me. The trick is to vectorize those simulations and calculations. Increasing to 100,000 simulation replicates took just 4 seconds on my 6-year-old Mac Pro.
(You'll want to increase the replicates to see the structure in the results; plot(p, coverage, las=1)
with 100k reps gives the following; it wouldn't be clear with just 1000 reps.)
To answer the original question, setting i in p
where p
is 0.01, 0.02, etc, means that coverage[i]
is trying to do coverage[0.01]
; since []
requires an integer it truncates it to zero, resulting in a numeric of length zero, numeric(0)
.
One of the other solutions is better, but for reference, to do it using your original loop, you'd probably want something like
p <- seq(.01, .99, by=.01)
coverage <- numeric(length(p))
N <- 1000
n <- 30
k <- qnorm(1-0.1/2)
for (i in seq_along(p)) {
count <- 0
for (j in 1:N) {
x <- rbinom(n, 1, p[i])
phat <- mean(x)
se <- sqrt(phat * (1 - phat) / n)
if( phat-k*se < p[i] && p[i] < phat+k*se ) {
count <- count + 1
}
}
coverage[i] <- count/N
}
coverage
@Karl Broman has a great solution that really shows how vectorization makes all the difference. It can still be improved slightly though (around 30%):
I prefer using vapply
- although the speed improvement of it isn't noticeable here since the loop is only 99 times.
f <- function(n=1000) {
z <- qnorm(0.95)/sqrt(30)
vapply(1:99/100, function(p) {
# simulate all of the binomials/30 at once
x <- rbinom(n, 30, p)/30
zse <- z*sqrt(x * (1-x))
# coverage
mean(x - zse < p & p < x + zse)
}, numeric(1))
}
system.time( x <- f(50000) ) # 1.17 seconds
Here's a version of the OP's original code using vapply
and it's about 20% faster than the original -but of course still magnitudes slower than the fully vectorized solutions...
g <- function(n=1000) {
vapply(seq(.01,.99,by=.01), function(p) {
count = 0L
for (j in seq_len(n)) {
x <- sum(rbinom(30,1,p))/30
# the constant is qnorm(0.95)/sqrt(30)
zse <- 0.30030781175850279*sqrt(x*(1-x))
if( x-zse < p && p < x+zse )
count = count + 1L
}
count/n
}, numeric(1))
}
system.time( x <- g(1000) ) # 1.04 seconds
精彩评论