I am trying to compute a definite double integral using scipy. The integrand is a bit complicated, as it contains some probability distributions to give weight to how likely is each value of x and y (like a mixture model). The following code evaluated to a negative number, but it should be bound by [0,1]. Additionally, it took about half an hour to compute.
I have two questions.
1) Is there a better way to ca开发者_如何学运维lculate this integral?
2) Where is this negative value coming from? The big question for me is how to speed the calculation up, as I can find the bug in my code that's leading to the negative later on my own.
from scipy import stats
from scipy.integrate import dblquad
import itertools
p= [list whose entries are each different stats.beta(a,b) distributions]
def integrand(x,y):
delta=x-y
marg=0
for distA,distB in itertools.permutations(p,2):
first=distA.pdf(x)
second=distB.pdf(y)
weight1=0
weight2=0
for distC in p:
if distC == distA:
continue
w1=distC.cdf(x)-distC.cdf(y)
if weight1 == 0:
weight1=w1
else:
weight1=weight1*w1
marg+=(first*weight1*second)
I=delta*marg
return I
expect=dblquad(integrand,0,1,lambda x: 0, lambda x: x)
This is asking essentially what for the expected value of the maximal distance between two points is in a vector of distributions. The limits of integration are y ∊ [0,x] and x ∊ [0,1]. This gave me about -.49, with an estimated error of the integral on the order of 10e-10, so it shouldn't be due to the integration method.
I've been fighting with this for a while and appreciate any help. Thanks.
edit: corrected typo
There are several ways to increase the speed of your computation.
You can use the
epsabs
andepsrel
parameters todblquad
to increase the tolreance of your integration. Of course, your results will be less accurate, but for debugging this is fine.You can considerably reduce the number of function evaluations in
integrand
by reordering the code like (warning, untested code)def integrand(x, y): marg = 0.0 cdf = dict((id(distC), distC.cdf(x) - distC.cdf(y)) for distC in p) for distA in p: weight = numpy.prod(cdf[id(distC)] for distC in p if distC is not distA) marg += weight * distA.pdf(x) * sum( distB.pdf(y) for distB in p if distB is not distA) return (x-y) * marg
But note that Python has quite an overhead for function calls, so writing this in pure Python won't get you too far (using something like Cython for this problem might help a bit).
I do not know why the integral becomes negative. Maybe I could tell you if you would give an example for p
-- this would enable us to actually try your code.
The error given by the integration method is just a number telling you how well the convergence behaviour is. Have you tried to calculate explicit values of the integrand?
By the way: Are you integrating pdf's? If yes: Are you sure about your integration limits?
精彩评论