开发者

Strategies for for games with incomplete information

开发者 https://www.devze.com 2023-03-18 22:11 出处:网络
Are there general strategies for games with incomplete information, especially trick-taking games 开发者_JS百科like Bridge and Doppelkopf?

Are there general strategies for games with incomplete information, especially trick-taking games 开发者_JS百科like Bridge and Doppelkopf?

I am interested in ways to implement AI strategies for such games.

The bounty is for the answer with the best description of a particular strategy.


I would like to contribute with some concrete information for the card game Doppelkopf, which the author asked exemplarily. In 2012 a master thesis by Sievers was written where he adopted the UCT algorithm for the Doppelkopf game.

UCT normally assumes a perfect information game, thus he first solves the "card assignment" problem, which is to guess a card assignment for each player based on some already known cards.

After solving this, he tried two ways to perform the algorithm with his solution to the card assignment problem:

1) Guess a card assignment for each UCT-tree and look at the average of multiple trees. He calls this strategy ensemble UCT.

2) Take a single uct tree and guess for each rollout a new assignment. In the selection phase of UCT you simply ignore all inconsistent children. He calls this strategy single UCT.

My feeling is that 2) makes a stronger AI but it seemed to be weaker, which he pointed out more clearly in a follow-up conference paper from 2015.

Inspired by AlphaGo's success, I started a project with a friend for his bachelor thesis and he made a policy neural network using a character based LSTM to guide the selection process of the UCT algorithm. His bachelor thesis only covers some test results for the ensemble-UCT but I already tested it for the single UCT player too and it makes a much stronger AI. I guess this is because the single UCT player benefits alot more from reducing the search space more effectively.

So this answer is more or less the same as @charley have been given but a little more concrete.


I think Expectimax is generally used for these types of problems. The strategy is to minimize the worst-case expected value of the opponent's score.


Problems where you have "incomplete information" tend to be addressed well with "Expert Systems" or filtering mechanisms. Remember, "game theory" merely relates to "optimizing a result" (optimizing something at the expense of everything else), so even approaches like expert systems can be codified into actual user-interactive-games.

An "expert system" example of "incomplete information" would be: I need a new car. The universe starts with all "known" cars, or possibly a dynamically (perhaps randomly) generated set of "possible" cars (e.g., different features/models, different manufacturers, different capabilities, etc.) Then, the system can ask me questions, like:

QUESTION: What is most important?

  1. hauling capacity
  2. gas mileage
  3. price
  4. I don't know

The important thing is the, "I don't know" -- it must be an option for every question, because the answers will result in "filtering" operations (e.g., remove possible cars from the available cars) or "ranking" operations (e.g., sorting some as "preferred" over others).

As it applies specifically to a game engine, you would establish a "universe of possibilities", like hallways you could walk down, tiles on a board, every possible orientation direction, every possible "weapon" that could be used, every possible "enemy individual" or "enemy groups" etc.

Then, based on the game dynamics, your job is ONLY to:

  1. Based on rules/context/user-input, REMOVE non-viable options.
  2. Based on rules/context/user-input, SORT/RANK most preferred options.
  3. The item at the "top of the list" is selected/employed RIGHT NOW in the game.

The reason this type of AI works so well relates to the "fuzzy math" domain (a good Google search on its own), where you can intelligently apply the incomplete information that you have, without considering (or tainting your system with) information you do not have, plus not "trusting" any atomic unit of information all-that-much (because the filtering-and-sorting tends to "average out" errors over time).

If you put a "time coefficient" on your filtering-and-sorting, (answers to old questions are increasingly considered "suspect", and old items that were "filtered out" or "sorted-to-the-bottom" are with increasing probability back-into-play), then you can get a really interesting, and dynamic, and infinitely running game.

And, that "dyanmic" and "infinitely running" game is before you add the stochastic component, which some games have. Games like Minefield and Battleship and Stratego mostly don't change during the running of the game, so your "localized-and-time-decay answers" may be sufficient for a (very) long running game. However, if you randomly generate new enemies, or your enemies move around, or there is some other random component to the "board setup" (like ocean tides where some paths are only sometimes available), then that adds a whole new level of complexity.

Ocean tides obscuring paths may be on a pseudo-regular or pseudo-randomized schedule. However, the "expert system" or "filtering" concept would suggest that you have a (possibly infinite) set of "events" (where the "ocean tide" is an "event"), and you similarly apply filtering-and-ordering to select the RIGHT NOW event (after you used filtering-and-ordering to decide that an "event" should occur at all, as opposed to all the other non-event options).


You could try implementing some Reinforcement Learning schema. It needs a lot of Math but it is nice to use.

Edit: Here's a link for a great resource on RL.

You can use RL to filter what's important for your AI and what should be ignored. Your AI will learn from his mistakes but it will learn in time and will know what's important for that game and what's not.

Edit2: In a nutshell, RL enables your agent to learn from his experiences, just like we humans learn. We burn once - we avoid touching fire. We get a reward after doing something - we keep on doing it for more reward. The same is for agents using RL.


I'm not aware of any general strategies but I have implemented a game with hidden moves and a decent AI.

The AI attempted to figure out what the human was up to by looking at all possible attack targets and figuring out what target the moves seemed to be concentrating on. The AI would try to reinforce if possible or evacuate if it couldn't expect to hold.

While it could sometimes be decoyed by attacking two targets at once this usually meant weak attacks and it didn't work very well unless you were already winning.

It was tough enough most people I gave copies to couldn't beat it at parity.

0

精彩评论

暂无评论...
验证码 换一张
取 消