Suppose I have an array A. I have a series of index pairs (a1, b1), (a2, b2)开发者_开发知识库 ... (an, bn)
I want to obtain all the sums of the elements between those pairs. i.e.
sum(A[a1:b1]), sum(A[a2:b2]), sum(A[a3:b3]) ...
In terms of run-time, what's the most efficient way of doing this?
Thanks!
Assuming your index pairs are stored in a NumPy array indices
of shape (n, 2)
and n
is fairly large, it is probably best to avoid any Python loop:
c = numpy.r_[0, A.cumsum()][indices]
sums = c[:,1] - c[:,0]
Here's another way:
a = np.random.rand(3000) indices = np.array([[0,3], [9,20], [5,30], [9,33]]) sums = np.add.reduceat(a, indices.ravel())[::2] assert np.all(sums == np.array([a[i:j].sum() for i,j in indices]))
The cumsum
one above is probably more efficient if there are many indices.
If you have a lot of index pairs and your array is long then caching might be an option. I'd try a recursive approach like
CACHE = {}
def mysum(a, b):
if (a, b) in CACHE:
return CACHE[(a, b)]
if a >= b:
return 0
s = A[a] + mysum(a+1, b)
CACHE[(a, b)] = s
return s
Not checked for correctness or efficiency, though. Decreasing the upper index b
could be used as well.
In the first instance I'd try the direct solution:
[np.sum(A[a:b]) for (a,b) in ab]
where ab
is the sequence of pairs.
A[a:b]
creates a view on the array; there's no copying of the data involved.
If this turns out to be too slow, please tell us more about the size of A
, how many pairs of indices you expect to get, whether the (a,b)
ranges tend to overlap, etc.
精彩评论