written 5.3 years ago by |
In which we examine the problems that arise when we try to plan ahead in a world where other agents are planning against us.
Games.
In multi-agent environments, any given agent will need to consider the actions of other agents and how they affects its own welfare. The unpredictability of these other agents can introduce many possible contingencies into the agents problem solving process. In competitive environment, the agents goals are in conflict, give rise to adversarial search problem – often known as games.
Mathematical game theory, a branch of economic views any multi-agents environment as a game provided that the impact of each agent on the other is “significant” regardless of whether the agents are co operative or competitive. In AI “games” are usually of a rather specialized kind – what game theorists call deterministic, turn – taking, two – player, zero – sum games of perfect information. In our terminology, this means deterministic, fully observable environments in which there are two agents whose actions must alternate and in which the utility values at the end of the game are always equal and opposite.
Example: If one player wins a game of chess (+1), the other player necessarily loses (-1). It is this opposition between the agents, utility functions that makes the situation adversarial the state of a game is easy to represent and agents are usually restricted to a small number of actions whose outcomes are defined by precise rules. Game playing was one of the first tasks undertakes in AI. Games unlike most of the toy problems are interesting because they are too hard to solve.
Example: Chess has an average branching factor of about 35 and games often 90 to 50 moves by each player, so the search tree has $35^{100}$ or $10^{154}$ nodes. Games like the real world, therefore require the ability to make some decision even when calculating the optimal decision is infeasible. Games also penalize inefficiency severely. Pruning allows us to ignore portions of the search tree that make no difference to the final choice, and heuristic evaluation functions allows us to approximate the true utility of a state without doing a complete path.
Optimal Decisions in Games.
We will consider games with 2 players, whom we will call MAX and MIN. MAX moves first, and then they take turns moving until the game is over. At the end of the game, points are awarded to the winning player and penalties are given to the loser. A game can be formally defined as a kind of search problem with the following components:
- The initial state: Which includes the board position ad identifies the player to move.
- A successor function: Which returns a list of (move pairs, each indicating a legal move and the resulting state.
- A terminal test : Which determines when the game is over. States where the game has ended are called terminal states.
- A utility function (objective function, pay off function)which gives a numeric value for the terminal states. In chess, the outcome is a win, loss, or draw, with values +1, - or 0. Some games have a wider variety of possible outcomes. The payoff in back gammon range from +192 to -192. The initial state and the legal moves for each sides define the game tree for the game.
Figure 6-1 shows part of the game tree for tic-tac-toe (naughts and crosses). From the initial state, MAX has nine possible moves. Play alternates between MAX’s placing an X and MINIs placing an O until we reach leaf nodes corresponding to terminal state. Such that one player has three in a row or all the squares are filled.
The number on each leaf node indicates the utility value of the terminal state from the point o view of MAX, high values are assumed to be good for MAX and bad for MIN. Its MAX’s job to use the search tree to determine the best move.
Figure 6.1 A (partial) search tree for the game of tic-tac-toe. The top node is the initial state and MAX moves first, placing an X in an empty square. We show part of the search tree giving alternating moves by MIN (O) and MAX (X), until we eventually reach terminal states, which can be assigned utilities according to the rules of the game.
Optimal strategies.
In a normal search problem, the optimal solution would be a sequence of moves leading to a goal state a terminal state that is win. In game, on the other hand, MIN has something to say about it. MAX therefore must find a contingent strategy, which specific MAX’s move in the initial state, then MAX’s moves in the states resulting from every possible response by MIN, then MAX’s moves in the states resulting from every possible response by MIN to those moves and so on.
We will see trivial game in figure, the possible moves for MAX at the root node are labeled $a_1 , \ a_2 \ and \ a_3$. The possible replies to a, for MIN are $b, \ b_2, \ b_3$ and so on. This particular game ends after one more each by MAX and MIN. The utilities of the terminal states in this game range from 2 to 14.
Given a game tree, the optimal strategy can be determined by examining the minima, value of each node. Which we write as MINI MAX-VALUE $(\eta)$. The mini max value of a node is the utility (for MAX) of being in the corresponding state, assuming that both players play optimally from there to the end of the game. Obviously, the mini max value of a terminal state is just its utility. Furthermore, given a choice, MAX will prefer to move to a state of maximum value, whereas. MIN prefers a state of minimum value. So we have the following:
MIN MAX Algo:
It is applied in two players games like chess, tic - tac - toe, etc
one - MAX assign
one - MIN assign
The mini max algo (figure 6.3) compute the mini max decision from the current state. It uses a simple recursive computations of the mini max values of each successor state, directly implementing the defining equations. The recursive proceeds all the way down to the leaves of the tree, and then the mini max values are backed up up through the tree as the recursion unwinds.
Example: In figure 6.1, the algo first re curses down to the three bottom – left nodes and uses the UTILITY function on them to discover that their values are 3, 12 and 8 respectively. Then it takes the minimum of these values 3 and returns it as the backed up value of node B. A similar process gives the backed up values of 2 for C and 2 for D. Finally, we take maximum of 3, 2 and 2 to get the backed up value of 3. For the root node. The maximum algo performs a complete depth first exploration of the game tree. If the maximum depth of the tree is m, and there are b legal moves at each point, then the time complexity of the mini max algo is $O(b^m)$ The space complexity is $O(b^m)$ for an algorithm that generates all successors at once, or O(m) for an algo that generates successors one at a time.
function MINI MAX – DECISION (state) returns an action
inputs : state, current state in game.
return the action in SUCCESSORS (state) with value
function MAX-VALUE (state) returns a utility value
if TERMINAL TEST (state) then return UTILITY (state)
V $\leftarrow$ $\infty$ for 9, S in SUCCESSORS (state) do
V $\leftarrow$ MAX (V – MIN – VALUE (S))
return V
function MIN – VALUE (State) returns a utility value
if TERMINAL-TEST (State) then return UTILITY (State)
V $\leftarrow$ $\infty$
for a, s. in SUCCESSORS (STATE) do
V $\leftarrow$ MIN (V, MAX – VALUE (S))
return V
Figure 6.3 Algo for calculating MINI MAX decision.
Figure 6.3 It returns the action corresponding to the best possible move, that is the move that leads to the outcome with the best utility, under the assumption that the opponent plays to minimize utility. The functions MAX – VALUE and MIN – VALUE go through the whole game tree, all the way to the leaves, to determine the backed up value of a state.