开发者

Solving a simple maximization game

开发者 https://www.devze.com 2023-04-12 05:03 出处:网络
I\'ve got a very simple question about a game I created (this is not homework): what should the following method contain to maximize payoff:

I've got a very simple question about a game I created (this is not homework): what should the following method contain to maximize payoff:

private static boolean goForBiggerResource() {
    return ... // I must fill this
};

Once again I stress that this is not homework: I'm trying to understand what is at work here.

The "strategy" is trivial: there can only be two choices: true or false.

The "game" itself is very simple:

P1  R1        R2 P2


          R5


P3  R3        R4 P4
  • there are four players (P1, P2, P3 and P4) and five resources (R1, R2, R3, R4 all worth 1 and R5, worth 2)

  • each player has exactly two options: either go for a resource close to its starting location that gives 1 and that the player is sure to get (no other player can get to that resource first) OR the player can try to go for a resource that is worth 2... But other players may go for it too.

  • if two or more players go for the bigger resource (the one worth 2), then they'll arrive at the bigger resource at the same time and only one player, at random, will get it and the other player(s) going for that resource will get 0 (they cannot go back to a resource worth 1).

  • each player play the same strategy (the one defined in the method goForBiggerResource())

  • players cannot "talk" to each other to agree on a strategy

  • the game is run 1 million times

So basically I want to fill the method goForBiggerResource(), which returns either true or false, in a way to maximize the payoff.

Here's the code allowing to test the solution:

private static final int NB_PLAYERS = 4;
private static final int NB_ITERATIONS = 1000000;

public static void main(String[] args) {
    double totalProfit = 0.0d;
    for (int i = 0; i < NB_ITERATIONS; i++) {
        int nbGoingForExpensive = 0;
        for (int j = 0; j < NB_PLAYERS; j++) {
            if ( goForBiggerResource() ) {
                nbGoingForExpensive++;
            } else {
                totalProfit++;
            }
        }
        totalProfit += nbGoingForExpensive > 0 ? 2 : 0;
 开发者_运维知识库   }
    double payoff = totalProfit / (NB_ITERATIONS * NB_PLAYERS);
    System.out.println( "Payoff per player: " + payoff );
}

For example if I suggest the following solution:

private static boolean goForBiggerResource() {
    return true;
};

Then all four players will go for the bigger resource. Only one of them will get it, at random. Over one million iteration the average payoff per player will be 2/4 which gives 0.5 and the program shall output:

Payoff per player: 0.5

My question is very simple: what should go into the method goForBiggerResource() (which returns either true or false) to maximize the average payoff and why?


Since each player uses the same strategy described in your goForBiggerResource method, and you try to maximize the overall payoff, the best strategy would be three players sticking with the local resource and one player going for the big game. Unfortunately since they can not agree on a strategy, and I assume no player can not be distinguished as a Big Game Hunter, things get tricky.

We need to randomize whether a player goes for the big game or not. Suppose p is the probability that he goes for it. Then separating the cases according to how many Big Game Hunters there are, we can calculate the number of cases, probabilities, payoffs, and based on this, expected payoffs.

  • 0 BGH: (4 choose 0) cases, (1-p)^4 prob, 4 payoff, expected 4(p^4-4p^3+6p^2-4p+1)
  • 1 BGH: (4 choose 1) cases, (1-p)^3*p prob, 5 payoff, expected 20(-p^4+3p^3-3p^2+p)
  • 2 BGH: (4 choose 2) cases, (1-p)^2*p^2 prob, 4 payoff, expected 24(p^4-2p^3+p^2)
  • 3 BGH: (4 choose 3) cases, (1-p)*p^3 prob, 3 payoff, expected 12(-p^4+p^3)
  • 4 BGH: (4 choose 4) cases, p^4 prob, 2 payoff, expected 2(p^4)

Then we need to maximize the sum of the expected payoffs. Which is -2p^4+8p^3-12p^2+4p+4 if I calculated correctly. Since the first term is -2 < 0, it is a concave function, and hopefully one of the roots to its derivative, -8p^3+24p^2-24p+4, will maximize the expected payoffs. Plugging it into an online polynomial solver, it returns three roots, two of them complex, the third being p ~ 0.2062994740159. The second derivate is -24p^2+48p-24 = 24(-p^2+2p-1) = -24(p-1)^2, which is < 0 for all p != 1, so we indeed found a maximum. The (overall) expected payoff is the polynomial evaluated at this maximum, around 4.3811015779523, which is a 1.095275394488075 payoff per player.

Thus the winning method is something like this

private static boolean goForBiggerResource ()
{
    return Math.random() < 0.2062994740159;
}

Of course if players can use different strategies and/or play against each other, it's an entirely different matter.

Edit: Also, you can cheat ;)

private static int cheat = 0;

private static boolean goForBiggerResource ()
{
    cheat = (cheat + 1) % 4;
    return cheat == 0;
}


I take it you tried the following:

private static boolean goForBiggerResource() {
    return false;
};

where none of the player try to go for the resource that is worth 2. They are hence guaranteed to each get a resource worth 1 every time hence:

Payoff per player: 1.0

I suppose also that if you ask this nice question is because you guess there's a better answer.

The trick is that you need what is called a "mixed strategy".

EDIT: ok here I come with a mixed-strategy... I don't get how Patrick found the 20% that fast (when he commented, only minutes after you posted your question) but, yup, I found out basically that same value too:

private static final Random r = new Random( System.nanoTime() );

private static boolean goForBiggerResource() {
    return r.nextInt(100) < 21;
}

Which gives, for example:

Payoff per player: 1.0951035

Basically if I'm not mistaken you want to read the Wikipedia page on the "Nash equilibrium" and particularly this:

"Nash Equilibrium is defined in terms of mixed strategies, where players choose a probability distribution over possible actions"

Your question/simple example if I'm not mistaken also can be used to show why colluding players can do better average payoffs: if players could colude, they'd get 1.25 on average, which beats the 1.095 I got.

Also note that my answers contains approximation errors (I only check random numbers from 0 to 99) and depends a bit on the Random PRNG but you should get the idea.


if the players cannot cooperate and have no memory there is only one possible way to implement goForBiggerResource: choose a value randomly. Now the question is what is the best rate to use.

Now simple mathematics (not really programming related):

  • assume the rate x represents the probability to stay with the small resource;
  • therefore the chance for no player going for the big one is x^4;
  • so the chance for at least one player going to the big one is 1-x^4;
  • total profit is x + ( 1 - x^4 ) / 2
  • find the maximum of that formula for 0% <= x <= 100%

the result is about 79.4% (for returning false)


Mmm, I think your basic problem is that the game as described is trivial. In all cases, the optimal strategy is to stick with the local resource, because the expected payoff for going for R5 is only 0.5 (1/4 * 2). Raise the reward for R5 to 4, and it becomes even; there's no better strategy. reward(R5)>4 and it always pays to take R5.

0

精彩评论

暂无评论...
验证码 换一张
取 消