OK, I think this will be fairly simple, but my numpy-fu is not quite strong enough. I've got a an array A of in开发者_StackOverflow中文版ts; it's tiled N times. I want a running count of the number of times each element is used.
For example, the following (I've reshaped the array to make the repetition obvious):
[0, 1, 2, 0, 0, 1, 0] \
[0, 1, 2, 0, 0, 1, 0] ...
would become:
[0, 0, 0, 1, 2, 1, 3] \
[4, 2, 1, 5, 6, 3, 7]
This python code does it, albeit inelegantly and slowly:
def running_counts(ar):
from collections import defaultdict
counts = defaultdict(lambda: 0)
def get_count(num):
c = counts[num]
counts[num] += 1
return c
return [get_count(num) for num in ar]
I can almost see a numpy trick to make this go, but not quite.
Update
Ok, I've made improvements, but still rely on the above running_counts method. The following speeds things up and feels right-track-ish to me:
def sample_counts(ar, repititions):
tile_bins = np.histogram(ar, np.max(ar)+1)[0]
tile_mult = tile_bins[ar]
first_steps = running_counts(ar)
tiled = np.tile(tile_mult, repititions).reshape(repititions, -1)
multiplier = np.reshape(np.arange(repititions), (repititions, 1))
tiled *= multiplier
tiled += first_steps
return tiled.ravel()
Any elegant thoughts to get rid of running_counts()
? Speed is now OK; it just feels a little inelegant.
Here's my take on it:
def countify2(ar):
ar2 = np.ravel(ar)
ar3 = np.empty(ar2.shape, dtype=np.int32)
uniques = np.unique(ar2)
myarange = np.arange(ar2.shape[0])
for u in uniques:
ar3[ar2 == u] = myarange
return ar3
This method is most effective when there are many more elements than there are unique elements.
Yes, it is similar to Sven's, but I really did write it up long before he posted. I just had to run somewhere.
精彩评论