Unbiased (random?) selection algorithm_问答_开发者

The problem: There are X number of properties, all floats between 0 and 1.

Choosing a property has a constant cost of C. (as opposed to leaving it at 0)

The cost of a property is proportional to its value (be it exponential or linear) How would I go making an unbiased (randomized?) selection of a subset of properties given a budget B?

Lets say the "cost" function is something like the following: (exponential version)

cost = C*sgn(x) + ke^(ax)
0 <= x <= 1
Constants: C, k, a

My first thought was some kind of optimization problem, but there's really nothing to maximize/minimize. I guess you could view it as finding a solution as close as possible to B. That doesn't really make sense though, since I'm not looking for the "best" solution, any solution close enough to B would do.

I then started looking at random sampling which s开发者_运维技巧eems like the most similar problem. I've found something called random weighted sampling which looks promising, but I'm not sure how the "budget" would fit in.

I'm not looking for something very exact or that guarantees independent results. Perhaps I am over complicating this? At this stage I'm just looking for something quick and dirty that can be implemented in Java or a similar language.

Edit: I followed the advice below and posted the question at here at math.stackexchange.com. I think I made it a lot more clear over there what I'm trying to achieve

You can use the maximum entropy principle to guide you here. http://en.wikipedia.org/wiki/Principle_of_maximum_entropy . Basically there are a huge set of assignments to the X, some (very small) subset of those assignments will exactly satisfy your budget. You want to pick uniformly at random from that set of satisfying assignments.

Unfortunately, while that gives me a clear and principled direction to think about the problem, I don't actually know how to efficiently sample from that set.

MarianP's comment seems to have a right direction. For example, if you have 3 properties with weights 1, 3 and 6 then you could mentally divide a line of length 1 + 3 + 6 = 10 to three segments:

+-+---+------+
|0|123|456789| 
+-+---+------+

Then roll a dice with range (0,9) and chose the segment (property) which is hit by the dice. Remove that property from the set, and recurse from top with the new set of properties (with those over the budget also filtered out).

This way a longer sequence (heavier property) is selected with higher chance.

Here I try and start with an inefficient procedure that satisfies one view of randomness and replace it with a more efficient one.

1) Inefficient - I assume that you have a list of the properties and their associated costs. Consider all possible sets of properties - if there are N properties there will be 2^N such sets. From this consider the set of sets of properties that fit within your budget. From this smaller (but probably still huge) set choose a set of properties at random.

2) Possible implementation. I presume that the costs are all integers of moderate size, or that you can accept the inaccuracy involved in rounding them to this. You can now build up an array A where A[i] is the number of ways of creating a set of total cost i. With 0 properties considered, A[0] = 1. Now consider a property of cost 3. The only non-zero element of A at the moment is A[0]. This gives you another way to generate A[0 + 3 = 3], so you can set A[3] = 1 + A[0]. By considering each property in turn you can set A[i] = the number of ways in which you can create a set of cost i, using these properties.

Once you have built the array, Consider each property in turn, starting with a cost of B. Chose an exact cost C by sampling from the costs >= B with probability proportional to the number of ways of creating a set of that cost. Now consider each property in turn. Given a property of cost c, if it is more than the curent cost C discard it. If not, consider A[C] and A[C - c] and choose between then with probablity proportional to their size. If you choose A[C] you discard the property. If you choose A[C - c] you include the property in your random set.

There is probably a way of doing something equivalent without rounding to integers using http://en.wikipedia.org/wiki/Metropolis%E2%80%93Hastings_algorithm - maybe I'll sit down and worry about this later.

Yes - this falls out pretty easily from Metropolis-Hastings. Basically, you start with any valid set - such as the null set, run a large number of Metropolis-Hastings steps, and hope that the result is mixed well enough that the resulting set is pretty much random according to your distribution. If you want another random sample, run it through lots more M-H steps again.

In the language of the Wikipedia entry, P(x') = P(x_t) = 1. To compute Q(x_t;x') you need to decide how to move from set to set. If you arrange the properties in sorted order of cost you can work out, given the total cost of a set, how many of the remaining properties you can afford to add to it - e.g. do a binary chop to work out the number of properties that are cheap enough, and use some sort of balanced tree to count the number of such properties already selected. So you can work out the number of different properties you could add to the current set, and of course you could also subtract any of the existing properties. If a step is either to add or to subtract a single property, you know how many different ways there are to go from away from x_t, and you can do a similar calculation for x'. So Q(x';x_t) is 1/(number of ways out of x_t) and Q(x_t;x') is 1/(number of ways out of x') and once you have decided whether to move or not, if you decide to move, you select one of these ways out of x_t at random.