开发者

How to compute weighted means of a vector within factor levels?

开发者 https://www.devze.com 2023-02-07 19:08 出处:网络
I am able to successfully get a simple mean of a given vector within factor levels, but in attempting to take it to the next step of weighting the observations, I can\'t get it to work.This works:

I am able to successfully get a simple mean of a given vector within factor levels, but in attempting to take it to the next step of weighting the observations, I can't get it to work. This works:

> tapply(exp.f,part.f.p.d,mean)
    1         2         3         4         5         6         7        8             9        10 
0.8535996 1.1256058 0.6968142 1.4346451 0.8136110 1.2006801 1.6112160 1.9168835     1.5135006 3.0312460 

But this doesn't:

> tapply(exp.f,part.f.p.d,weighted.mean,b.pct)
Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length
> 

In the code below, I am trying to find the weighted mean of exp.f, within levels of the factor part.f.p.d, weighted by the observations within b.pct that are in each level.

b.exp <- tapply(exp.f,part.f.p.d,weighted.mean,b.pct)

Error in weighted.mean.default(X[[1L]], ...) : 
  'x' and 'w' must have the same length

I am thinking I must be supplying the incorrect syntax, as all 3 of these vectors are the same length:

> length(b.pct)
[1] 978
> length(exp.f)
[1] 978
> length(part.f.p.d)
[1] 978

What is the correct way to do this? Thank you in开发者_如何学Python advance.


Now I do it like this (thanks to Gavin):

sapply(split(Data,Data$part.f.p.d), function(x) weighted.mean(x$exp.f,x$b.pct)))

Others likely use ddply from the plyr package:

ddply(Data, "part.f.p.d", function(x) weighted.mean(x$exp.f, x$b.pct))


Your problem is that tapply does not "split" the extra arguments supplied (through its ... arguments) to the function, as it does for the main argument X. See the 'Note' on the help page for tapply (?tapply).

Optional arguments to FUN supplied by the ... argument are not divided into cells. It is therefore inappropriate for FUN to expect additional arguments with the same length as X.

Here is a hacky solution.

exp.f <- rnorm(10)
part.f.p.d <- factor(sample(1:5, size = 10, replace = T))
b.pct <- rnorm(10)
a <- split(exp.f, part.f.p.d)
b <- split(b.pct, part.f.p.d)
lapply(seq_along(a), function(i){
  weighted.mean(a[[i]], b[[i]])
})


I've recreated the error with some dummy data. I'm assuming that part.f.p.d is some kind of factor that you're using to separate the other vectors.

b.pct <- sample(1:100, 10) / 100
exp.f <- sample(1:1000, 10)
part.f.p.d <- factor(rep(letters[1:5], 2))

tapply(exp.f, part.f.p.d, mean) # this works
tapply(exp.f, part.f.p.d, weighted.mean, w = b.pct) # this doesn't

A call to traceback() helps to uncover the problem. The reason the second doesn't work is because the INDEX argument (ie part.f.p.d) that you passed to tapply() is used to split the X argument (ie exp.f) into smaller vectors. Each of these splits is applied to weighted.mean() together with the w argument (ie b.pct), which was not split.

EDIT: This should do what you want.

sapply(levels(part.f.p.d), 
       function(whichpart) weighted.mean(x = exp.f[part.f.p.d == whichpart], 
                                         w = b.pct[part.f.p.d == whichpart]))
0

精彩评论

暂无评论...
验证码 换一张
取 消