开发者

Sorted quantile mean via Rpy

开发者 https://www.devze.com 2023-01-12 15:52 出处:网络
The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I\'m not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran int

The real goal here is to find the quantile means (or sums, or median, etc.) in Python. Since I'm not a power user of Python but have used R for a while, my chosen route is via Rpy. However, I ran into the problem that the returned list of means are not correspondent to the order of the quantiles. In particular, I have the followings in R:

> a = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
> b = c(2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000)
> prob = seq(0,5)/5
> br = quantile(a,prob)
> rcut = cut(a, br, include.lowest = TRUE)
> quintile_means = tapply(b, rcut, mean)
> quintile_means
[1,2.8] (2.8,4.6] (4.6,6.4] (6.4,8.2]  (8.2,10] 
      3        3开发者_运维问答0       300      3000     30000 

which is all very good. However, if I translate the code into Rpy, I got

>>> import rpy
>>> from rpy import r
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = [2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000]
>>> prob = [ x / 5.0 for x in range(6)]
>>> br = r.quantile(a, prob)
>>> rcut = r.cut(a, br, include_lowest=r.TRUE)
>>> quintile_means = r.tapply(b, rcut, r.mean)
>>> print quintile_means
[30.0, 300.0, 3000.0, 30000.0, 3.0]

Note the final list is mis-ordered (we know it because a and b are both ordered in this case). In general, I just have no way to recover the correct order from the lowest to highest quantile in Rpy. Any suggestions?

In addition (not in substitution, as I'd like to know the answer to the above question), if you can suggest a way to directly perform the analysis in python, that will be great too. (I don't have numpy or scipy installed.) Thx!

EDIT: To clarify, a and b are paired but not necessarily ordered. For example, a is the size of eyes and b is the size of nose. I'm trying to find out that in the various quantiles of a, what are the means of the correspondent bs. Thanks.


Try rpy2.

With rpy2 >= 2.1.0, this could be:

from rpy2.robjects.vectors import IntVector
from rpy2.robjects.packages import importr
base = importr('base')
stats = importr('stats')

a = IntVector((1, 2, 3, 4, 5, 6, 7, 8, 9, 10))
b = IntVector((2, 4, 20, 40, 200, 400, 2000, 4000, 20000, 40000))
prob = base.seq(0,5).ro / 5
br = stats.quantile(a,prob)
rcut = base.cut(a, br, include_lowest = True)
quintile_means = base.tapply(b, rcut, stats.mean)
print(quintile_means)


If you don't need labels (e.g: (8.2,10]) then you could call cut with labels=FALSE. This should keep order (and speed up your code for free).


I just have no way to recover the correct order from the lowest to highest quantile in Rpy

If sorting the list from the lowest to the highest solves your problem, try sorted(quintile_means).

0

精彩评论

暂无评论...
验证码 换一张
取 消