I'm a bit at a loss as to how to find a clean algorithm for doing the following:
Suppose I have a dict k:
>>> k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
I now want to randomly select one of these keys, based on the 'weight' they have in the total (i.e. sum) amount of keys.
>>> sum(k.values())
>>> 274
So that there's a
>>> 68.0/274.0
>>> 0.24817518248175183
24.81% percent chan开发者_StackOverflowge that A is selected.
How would you write an algorithm that takes care of this? In other words, that makes sure that on 10.000 random picks, A will be selected 2.481 times?
Here's a weighted choice function, with some code that exercises it.
import random
def WeightedPick(d):
r = random.uniform(0, sum(d.itervalues()))
s = 0.0
for k, w in d.iteritems():
s += w
if r < s: return k
return k
def Test():
k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
results = {}
for x in xrange(10000):
p = WeightedPick(k)
results[p] = results.get(p, 0) + 1
print results
Test()
This should do the trick:
>>> k = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
>>> import random
>>> def weighted_pick(dic):
... total = sum(dic.itervalues())
... pick = random.randint(0, total-1)
... tmp = 0
... for key, weight in dic.iteritems():
... tmp += weight
... if pick < tmp:
... return key
The algorithm would be this..
Select a number randomly between 1 and 274. To do that, call a rand() funciton (assume it returns a value between 0 and 1), multiply rand() by 274. The resulting value should now be in a range. If its between 1 and 68, select A, if its between 69 and 130 select B and so on. This way, your probability stays alive and your operation succeeds.
PS: I am a Java guy, dont know the syntax of Python.
The simplest way of doing this when your weights are relatively small integers (such as in your example) is to build a long string containing all the characters in the appropriate weights and choose a character at random from it:
import random
d = {'A': 68, 'B': 62, 'C': 47, 'D': 16, 'E': 81}
s = ''.join(k*v for k,v in d.items())
random.choice(s)
Note though that this method will use quite a lot of memory if your weights are large, in which case you might prefer a different solution.
one should also look at this link
make two list for k, say xk
and yk
from scipy import stats
custm = stats.rv_discrete(name='test', values=(xk, yk))
custm.rvs(size=1)
I've developed an algorithm a few years ago, with application in Perl and SQL, you can read about it here, complete with analysis and tests why it (most likely) is correct.
The concept is simple: for each item, pick a random number, pull it through some function that depends on the items' weight, and pick the item with the lowest value.
That function is:
x[i] = -log(1 - rand())/weight[i]
精彩评论