Nine Men`s Morris, Minimax, and Alpha Beta Pruning Search

Adversarial Search:
Nine Men’s Morris, Minimax, and Alpha Beta Pruning
Search Algorithms
Artificial Intelligence Laboratory
This document is a TAILS module.
Stephanie E. August
Matthew J. Shields
October 25, 2015
Loyola Marymount University
1
1 The Idea
1.1 Purpose
The purpose of this lab is to familiarize the experimenter with the adversarial search algorithm named minimax via an interactive two player game called Nine Men’s Morris. The
experimenter is provided the interactive game and a working implementation of the basic
limited depth minimax algorithm. While performing the experiments in this lab, an student
learns how slight modifications to the algorithm can greatly enhance its performance. First,
the student is instructed to add Alpha-Beta pruning to eliminate from consideration portions
of the search space that are unlikely to a↵ect the outcome of the game. Next the student
is encouraged to modify the evaluation function and the search depth cut-o↵ to learn how
those changes a↵ect game play.
1.2 Background
1.2.1 Minimax Algorithm
Minimax is an adversarial search algorithm commonly used to solve turn-based games having
two or more players. This recursive algorithm implements a depth first search of a game
tree. The game tree is built by playing the game out in memory to maximize the possible
loss of the other player or adversary. In other words the game tree is built such that the
root node is the current state of the game and the subsequent levels of the tree alternately
represent the hypothetical moves of the starting player and the adversary. The children of
the root node include all of the possible game states for each move that the current player
can make on their turn. The root node’s grandchildren include all of the possible game
states two turns or two ply in the future, or all the possible moves that the starting player
can make in response to the adversary’s move. The great-grandchildren of the root include
all of the possible game states three ply into the future, where the adversary is responding
to the player’s next move, and so on. The algorithm assumes that the player whose turn
is represented by the root node wants to maximize the value of the root’s children, that
is, select a move that is to the starting player’s maximum advantage, given the moves that
the player’s opponent might make. The next depth of the tree portrays the moves that the
adversary might take in response to the player’s initial move. The algorithm assumes that
the adversary would like to maximize the adversary’s advantage, which has the a↵ect of
inflicting the greatest possible damage on the starting player.
A search tree for tic-tac-toe is presented in figure 1 to illustrate this. The evaluation function
in this example was taken from Russell and Norvig (2003). Xn is defined as the number of
rows, columns, or diagonals with just n X’s. On is defined as the number of rows, columns,
or diagonals with just n O’s. The utility function assigns +1 to any position with X3 = 1
2
Figure 1: 2-ply game tree for Tic-Tac-Toe showing MINIMAX Algorithm.
and -1 to any position with O3 = 1. All other terminal positions have utility 0. For each
non-terminal position s the linear evaluation function is defined as
Eval(s) = (3X2 (s) +X1 (s)) – (3O2 (s) + O1 (s))
While the minimax algorithm is running it first generates and evaluates all of the MIN
nodes. It then generates and evaluates all of the MAX nodes below the newly generated
MIN nodes working from left to right top to bottom. In this example the search ends at a
2-ply depth. Since minimax is a recursivefunction each branch in the MAX depth returns
its minimum-valued evaluation to its MIN node parent. This value becomes the MINIMAX
backed up value. Then the MIN node children return their maximum minimax backed up
value to the root node. The automated player then selects the path that leads to the MIN
level’s maximum value branch. This move gives the current player the best possible chance
of winning the game. This search is repeated, with all nodes regenerated and reevaluated,
for every move that the agent makes.
3
Figure 2: Two-ply game tree for Tic-Tac-Toe showing pruned branches circled in red.
1.2.2 Alpha-Beta Pruning
Alpha-Beta pruning is almost always found in any implementation of the MINIMAX algorithm. This is because it is easily added to the algorithm with very little e↵ort, and it can
greatly increase the performance of the algorithm by pruning o↵ branches that do not need
to be evaluated. It does this by passing two new values, alpha and beta, into the recursive
maxValue and minValue calls. The maxValue function maximizes alpha while the minValue
function minimizes beta. When in a MAX ply if a child node evaluates to a value that
is greater than or equal to beta then all remaining children will be pruned without being
evaluated. When in a MIN ply the opposite is true, if a child node evaluates to a value
that is less than or euqal to alpha then all remaining children will be pruned without being
evaluated. The main idea for the MAX ply is that if a child evaluates to something greater
than or equal to beta then this move is too risky and this branch is deserted and the opposite
is true for the MIN ply. Figure 2 shows the branches that would be pruned from the 2-ply
Tic-Tac-Toe search tree circled in red.
4
1.2.3 Nine Men’s Morris
Nine Men’s Morris (9MM) is a strategy board game in the windmills game category. According to Botermans and Fankbonner the first stone windmills game board was found in a
small Irish town’s graveyard. That board dated back to 2000 BCE. Another windmills game
board was also found on the ceiling of an ancient Egyptian temple in Kurna which dated
back to 1400 BCE. The oldest Nine Men’s Morris version of the windmills games was found
in Ceylon where the game board is engraved in the steps of a hill in Mihintale between 17CE
and 19CE. This version of the game gained popularity throughout Europe during the middle
ages around 1600CE. It has since lost its popularity and many people have never heard of
this game. The game has very simple rules yet requires quite a bit of strategy to master.
Nine Men’s Morris is considered a non-trivial game.
Rules: The board consists of three concentric squares with the faces of each square connected
by four intersecting lines as shown in figure 2. Each intersection is a space on the board that
can be occupied by one piece from either player.
Figure 3: Nine Men’s Morris game board.
This game belongs to a category of games referred to as the windmills. In this game a mill
(short for windmill) is three pieces from either player in a row. When a player makes a mill,
the player must remove one of their opponent’s pieces from any space on the board that is
5
not also in a mill. There are two possible ways of winning states: either the opponent has
only 2 pieces remaining on the board, or the opponent has no moves available. The game
has three distinct phases of play, populate, f ight, and f light. P opulate and f ight always
occur. F light might or might not depending on how the victor wins the game. The rules
for mills as described above are applicable for all phases of the game.
Each player starts the game in the populate phase. During this phase the players take turns
placing each of their nine pieces onto empty spaces on the board. Blue always goes first.
Players are not allowed to move a piece that has already been placed on the board. When
both players have placed all of their pieces on the board, both players enter thef ight phase
of the game.
During the fight phase of the game players take turns moving one of their own pieces to
a vacant adjacent space. Game play at this point is according to each player’s winning
strategy. A player might try to block all of the opponent’s moves, or attempt to build mills
and remove all of the opponent’s pieces.
When a player has only three pieces remaining on the board, the player with three pieces
and only that player enters the flight phase. During this phase a player can move to any
open space on the board even if it is on the opposite side of the board.
6
2 Applications
The Minimax algorithm is an essential adversarial search algorithm that has been applied
to problems ranging from zero-sum game play to real-time pursuer evasion. The application
of the Minimax algorithm to real world problems is no di↵erent than any other algorithms
application in that it is often twisted into a hybrid with other algorithms or concepts. The
following three real-world applications will be considered below; Deep Blue; Envelope Constrained Filters; and Pursuit Evasion.
Deep Blue II was a super compuetr designed by IBM to play Chess at a Grand Master
level. In general Deep Blue implements a depth limited Minimax algorithm with alpha-beta
pruning and NegaScout, which incorporates singular extensions. To improve the efficiency
of alpha-bets pruning, the move generator was designed such that the optimal moves were
generated first. Singular extension algorithms identify the interesting branches of the search
tree while it is being searched. Deep Blue used singular extensions to determine which
branches it should search to a deeper ply, as deep as 30 or 40 ply (i.e., 30 or 40 consecutive
moves). Singular extension also allowed Deep Blue to search branches that would normally
be cut o↵ during alpha-beta pruning. Deep Blue used a combination of software and 256
processors working in parallel that were specifically designed to run a variation of Minimax
with an extremely intelligent evaluation function. The software would start the search and
introduce changes to the search such as singular extensions, it would then pass the search of
the last 5-ply to the hardware [Ham]. The processors could analyze a combined 200 million
board positions per second [Kur]. Deep Blue successfully beat the Chess World Champion
Kasparov in 1997 with a match score of 3.5 to 2.5.
Envelope Constrained Filters (ECFs) provide an output response to inputs that are within
a predefined upper and lower limit. The upper and lower limits are defined by time functions which form the filter output window in the time domain. [Pet1]. ECFs are used for
radar pulse compression and had previously been implemented using a linear programming
approach. This linear programming approach had an inherent design problem; it could not
minimize the sidelobes of the filter response. Petrovic et al, developed a non-linear approach
based on the Minimax algorithm which could successfully minimize the sidelobes. Sidelobes
are duplicates of the main beam but are not the main beam and not as strong as the main
beam, they are essentially noise. Sidelobes are caused by the shape of the producing antenna’s aperture. Figure 4 shows a diagram of a typical antenna pattern and what side lobes
look like.
Honeywell has developed a 3D Minimax Pursuit Evasion algorithm. This algorithm assumes
that the missile and aircraft can both accelerate perpendicular to their velocity and that the
aircraft can maneuver in an optimal way to avoid the missile [Fri]. In this application of
the Minimax algorithm the pursuer wants to minimize the missile distance and the missile
wants to maximize some cost function.
These examples provide some understanding of how the Minimax algorithm can be used to
solve real world problems.
7
Figure 4: Typical Antenna Pattern [Mr. PIM at English Wikipedia, April 6, 2007]
8
3 Input/Process/Output
3.1 Input
The input to this laboratory includes two distinct pieces; player moves and adversarial
agent moves. Experiments performed over the course of this laboratory will require the
experimenter to modify the adversarial agent code and play ganes against the modified
agent to evaluate its performance.
3.2 Process
The process of this laboratory is to have the experimenter modify the adversarial agent and
play games against that agent to demonstrate the e↵ect of changes made to the agent code.
The player by default is blue and makes the first move to start the game. The rules specifying
how the game is played can be found in Section 2 (Background) of the idea statement.
3.3 Output
The output of this laboratory will be set of statistics for each experiment showing the number
of wins, losses, and ties that each modified agent achieved against its human competitor.
The statistics, although somewhat skewed by the human aspect of the process, should show
the improvement or degradation of the adversarial agent’s performance for each of the code
changes performed in each experiment. The author of this laboratory realizes that a more
solid scientific approach would have to have a static agent, rather than a human player,
play against an agent that is modified for each experiment. This would take the human
aspect out of the process, but in the author’s opinion it would also take the fun out of the
laboratory. The firm factor is important here because the goal of this laboratory is to spark
the experiementer’s interest in artificial intelligence.
9
4 Design Description
4.1 Introduction
4.1.1 Purpose
This software design document describes the architecture and system design of the Nine
Men’s Morris computer game.
4.1.2 Scope
The purpose of the Nine Men’s Morris game is to familiarize the reader with the adversarial
search algorithm names Minimax via an interactive two player game called Nine Men’s
Morris. The reader will be provided with the interactive game and working implementation
of the basic limited depth Minimax algorithm. This project will be designed such that the
reader can modify/improve the Minimax algorithm with no impact to the game engines core.
The main focus of this design document will be on the Minimax algorithm and the state
description used to define a single instance of a Nine Men’s Morris game.
4.1.3 Overview
This document is organized in a top-down manner starting with a system overview which
will provide a general description of the fucntionality, context and design of the Nine Men’s
Morris laboratory. Next, a high-level overview of the system architecture will be provided
which will include the architectural design, a decomposition description, and a rational for
the design decisions that were made for this system. Next, a detailed description of the data
design wull be covered, which will define the state description of a single instance of a Nine
Men’s Morris game. Next a low-level description of the Minimax components will be specified
including brief descriptions of each component, their attributes, and operations. Finally an
overview of the Human Machine Interface (HMI) will be provided including instrcutions
detailing how the HMI is used.
4.1.4 Graphical Notation
The graphical notation used in the diagrams and throughout this document are described
in the table below:
10
4.2 System Overview
The Nine Men’s Morris laboratory consists of one to three possible active components at
any given time. The HMI component is the only component required to play the Nine Men’s
Morris game, but optionally player one and/or player two can be played by adversarial
agents. In Figure 5 the user interacts with the HMI by either playing against themselves or
by selecting an agent for player one and/or player two. If the user selects an agent for player
one and/or player two the HMI listens on a posrt and launches the agent program which
then attaches to the listening port. Player one and player two agents both have their own
port numbers and attach independent of one another.
4.3 System Architecture
4.3.1 Architectural Design
As shown in Figure 6, the software architecture for the Nine Men’s Morris system consists
of one to three components. The HMI enforces all of the game rules and provides the user
with an interactive interface to evaluate the performance of one or two adversarial agents
simultaneously. The user can select an agent to play as player one and an agent to play as
player two. Over the course of this laboratory the user may be instructed to create a new
enhanced agent. The user can then play their enhanced agent against the baseline agent to
evaluate the performance of the enhanced agent.
11
Figure 5: System Overview Diagram - The focus in this diagram is the HMI
Figure 6: Software Architecture Diagram
4.3.2 Decomposition Description
The decomposition description is limited to the baseline agent that is provided with this
laboratory. Figure 7 shows the Class Association Diagran for the baseline adversarial agent.
An adversarial agent program is composed of its socket information and a Minimax object.
The Minimax object encapsulates all of the attricbutes and operations needed to perform a
12
Minimax search as well as the current board state. A Nine Men’s Morris state is composed
of all of the state information about a single instance of a game including a board state
and the action taken to achieve that state. An action is composed of a start node and a
destination node which represents the node from which a piece was moved and the node to
where it was moved. A game board is composed of nodes and mills both of which contain
the details about where all of the pieces can be found on the board.
Figure 8 details the sequence of events during nominal game play with two Agent playing
against one another. This sequence diagram is meant to capture the communications between
the HMI and the agents.
Figure 9 detials the sequence of events that take place within an agent during nominal game
play. This sequence diagram is meant to capture the communications between each of the
components that make up an agent.
4.3.3 Design Rationale
The system architecture as shown in Figure 6 above separates the agents from the HMI. The
reason this separation was architected into this project was to separate agent implementation
from the games framework. The decision to attach the agents to the HMI through a TCP
socket connection was made to allow an agent to be implemented in any programming
language that supports TCP sockets. This provides the flexibility necessary to allow a
programmer to write an agent in the programming language of their choice.
13
Figure 7: Adversarial Agent Class Association Diagram
14
Figure 8: Sequence Diagram - High level showing nominal game play.
15
Figure 9: Sequence Diagram - Low level showing communication between agent components.
16
5 Implementation-Specific Design Description
5.1 Data Design
5.1.1 Data Description
The baseline agent stores its state data in the form of a search tree where each node of the
tree contains the following key information:
• The board state
• The action taken to reach the board state
• The turn count
• The blue players current phase of game play
• The red players current phase of game play
• A value which is the utility of the state
• A reference to the child state with the highest utility
The board state is queried from the HMI which returns a string of 24 characters separated
by spaces. Each character belongs to the set e r b where ’e’ denotes an empty node; ’r’
denotes a node populated by a red piece; and ’b’ denotes a node populated by a blue piece.
The nodes are ordered in the list by reading the board from left-to-right and top-to-bottom.
The character number in the string of 24 characters is shown in Figure 10 which maps the
node identifiers to the game board. The action taken to reach the board state is populated
for all states with the exception of the initial state passed into the Minimax algorithm.
The turn count is the sum of the number of turns taken by player one and player two
since the beginning of the current game. The blue players current phase belongs to the set
POPULATE FIGHT FLIGHT where ”POPULATE” means the player still has pieces that
are not on the board; ”FIGHT” means the player has all their pieces on the board and
has left the ”POPULATE” phase; and ”FLIGHT” means the player has only three pieces
remaining on the board.
The value which is the utility of the board state is populated by the Minimax adversarial
search algorithm during execution.
The reference to the successor state with the largest value exists merely for convenience.
17
Figure 10: Map of Nodes on Nine Men’s Morris board.
5.1.2 Data Dictionary
Details about every class including their attributes and methods are listed below:
Class Summary
Action
Agent
Board
Main
Mill
MiniMax
Node
State
The Action class is a simple class that defines the details about a
move in the Nine Men’s Morris board game.
The Agent class is the adversarial agent that knows how to play the
board game Nine Men’s Morris.
The Board class contains the state of a single instance in a Nine
Men’s Morris game.
The Main class conducts the execution of the agent program.
The Mill class is a simple class that defines the details about a single
mill in the Nine Men’s Morris board game.
The MiniMax class encapsulates all of the fields and methods used
for analyzing a Nine Men’s Morris game state using the Artificial
Intelligence adversarial search algorithm called MiniMax.
The Node class is a simple class that defines the details about a node
in the Nine Men’s Morris board game.
The State class contains all of the state details for a single turn in a
Nine Men’s Morris game.
18
Action Attribute Detail
Node
destinationNode
Node
opponentNode
Node
startNode
The node on the game board where a player’s piece can move.
If not null then the opponent’s piece that will be removed from the
board as the result of this player creating a mill.
The node on the game board where a player’s piece is currently located.
19
6 Test suite and Drivers
20
7 Experiments
This lab consists of the several distinct activities or experiments listed below. Each activity
provides the experimenter with the information necessary to complete this lab.
7.1 Location
All of the experiments and exercises of this section are located in the Adversarial Search/
Experiments .
7.2 Experiment 1: Learn the 9MM game by playing it
This section will contain information about how to start and play the TAILS version of
9MM. Generally what the user can expect to see while running the application.
7.3 Experiment 2: Study the design of the code
a. Sketch a design for the 9MM code as you envision it. The design can be graphical or
narrative, high level, and contained within one or two pages.
b. Study the implementation-independent design description provided.
c. How close was your design to the actual design? Record your thoughts.
d. What is the biggest surprise about the actual design of the code? Record your thoughts.
e. What part of the design is the most challenging to understand? Record your thoughts.
f. Exchange your design and reflection with your lab partner. Do your designs complement with one another or conflict? Discuss any points where they di↵er. Summarize
your team’s discussion.
g. Discuss the implementation-independent design description provided, especially the
aspects you each found challenging. Summarize your discussion.
21
8 Source Code
8.1 Location
The source code for this module can be found under the Software Application/Agent Architecture Application/ Program folder path.
8.2 Source code’s headers
The main source codes for the online application of this module begin with a header which
contains the following information:
<filename>.<file extension>
A little description of how and where the code is being used.
<filename>.<file extension>
Author’s name
And if exists: A change history.
22
9 Complexity Analysis
9.1 Tima and Space Complexity of MINIMAX algorithm
Let us imagine a partial game tree for the tic-tac-toe game like figure 11 using the minimax
algorithm.
Figure 11: Partial tree for a Tic-Tac-Toe game
9.1.1 Time
All the nodes in the tree have to be generated once at some point, and the assumption is
that it costs a constant time c to generate a node (constant times can vary, you can just
pick c to be the highest constant time to generate any node). The order is determined by
the algorithm and ensures that nodes don’t have to be repeatedly expanded.
If we show the branches by a factor called b, we will see from the figure that it costs cb0 to
calculate the zero level. The next level in the tree will have b1 number of nodes and it costs
cb1 to generate this level. If we continue like this, we can say that the cost to generate the
mth level will be cbm .
At the deepest level of the tree at depth d there will be bd nodes, the work at that level
therefor is c ⇤ bd . The total amount of work done to this point is c ⇤ b0 + c ⇤ b1 + ... + c ⇤ bd .
23
For the complexity we only look at the fastest rising term and drop the constant so we get:
O(c + c ⇤ b + c ⇤ b2 + ... + c ⇤ bd ) = O(bd )
9.1.2 Space
let us assume a smaller tree with branching factor or b = 3 and special notations like figure
12. The figure shows the algorithm at di↵erent stages for b = 3. Star {*} indicates currently
Figure 12: A small tree with b = 3
expanded nodes, question mark {?} indicates unknown nodes and summation {+} indicates
nodes who’s score has been fully calculated.
In order to calculate the score of a node, you expand the node, pick a child and recursively
expand until you reach a leaf node at depth d. Once a child node is fully calculated you
move on to the next child node. Once all b child nodes are calculated the parents score
is calculated based on the children and at that point the child nodes can be removed from
storage. This is illustrated in the figure above, where the algorithm is shown at 4 di↵erent
stages.
At any time you have one path expanded and you need c ⇤ b storage to store all the child
nodes at every level. Here again the assumption is that you need a constant amount of space
per node. The key is that any subtree can summarised by its root. Since the maximal length
of a path is d, you will maximally need c ⇤ b ⇤ d of space. As above we can drop constant
terms and we get O(c ⇤ b ⇤ d) = O(b ⇤ d).
24
9.2 Time and Space Complexity of Alpha-Beta Pruning
The benefit of alpha–beta pruning lies in the fact that branches of the search tree can be
eliminated. This way, the search time can be limited to the ’more promising’ subtree, and
a deeper search can be performed in the same time. Like its predecessor, it belongs to
the branch and bound class of algorithms. The optimization reduces the e↵ective depth to
slightly more than half that of simple minimax if the nodes are evaluated in an optimal or
near optimal order (best choice for side on move ordered first at each node).
9.2.1 Time
With an (average or constant) branching factor of b, and a search depth of d plies, the
maximum number of leaf node positions evaluated (when the move ordering is pessimal) is
O(b ⇤ b ⇤ ... ⇤ b) = O(bd ) – the same as a simple minimax search. If the move ordering for
the search is optimal (meaning the best moves are always searched first), the number of leaf
node positions evaluated is about O(b ⇤ 1 ⇤ b ⇤ 1 ⇤ ... ⇤ b) for odd depth and O(b ⇤ 1 ⇤ b ⇤ 1 ⇤ ... ⇤ 1)
for even depth, or O(bd/2 ). In the latter case, where the ply of a search is even, the e↵ective
branching factor is reduced to its square root, or, equivalently, the search can go twice as
deep with the same amount of computation.
The explanation of b ⇤ 1 ⇤ b ⇤ 1 ⇤ ... is that all the first player’s moves must be studied to find
the best one, but for each, only the best second player’s move is needed to refute all but the
first (and best) first player move – alpha–beta ensures no other second player moves need be
considered. The alpha-beta pruning examines only O(b3/4 ).
9.2.2 Space
The e↵ectiveness of alpha–beta pruning [?] is highly dependent on the order in which the
states are examined. In an optimized order, alpha-beta needs to examine only O(bd/2 ) nodes
to pick the bestp
move, instead of O(bd ) for minimax. This means that the e↵ective branching
factor becomes b instead
of b, so we can say that the space complexity of alpha-beta pruning
p
in this case will be b ⇤ d multiplied by any constant c.
25

Download Report

Nine Men`s Morris, Minimax, and Alpha Beta Pruning Search

Paperzz.com

Your Paperzz