What's wrong with this snippet of code?
import numpy as np
from scipy import stats
d = np.arange(10.0)
cutoffs = [stats.scoreatpercentile(d, pct) for pct in range(0, 100, 20)]
f = lambda x: np.sum(x > cutoffs)
fv = np.vectorize(f)
# why don't these two lines output the same values?
[f(x) for x in d] # => [0, 1, 2, 2, 3, 3, 4, 4, 5, 5]
fv(d) # =>开发者_如何学Python; array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])
Any ideas?
cutoffs
is a list. The numbers you extract from d
are all turned into float
and applied using numpy.vectorize
. (It's actually rather odd—it looks like first it tries numpy floats that work like you want then it tries normal Python floats.) By a rather odd, stupid behavior in Python, floats are always less than lists, so instead of getting things like
>>> # Here is a vectorized array operation, like you get from numpy. It won't
>>> # happen if you just use a float and a list.
>>> 2.0 > [0.0, 1.8, 3.6, 5.4, 7.2]
[True, True, False, False, False] # not real
you get
>>> # This is an actual copy-paste from a Python interpreter
>>> 2.0 > [0.0, 1.8, 3.6, 5.4, 7.2]
False
To solve the problem, you can make cutoffs
a numpy array instead of a list
. (You could probably also move the comparison into numpy operations entirely instead of faking it with numpy.vectorize
, but I do not know offhand.)
精彩评论