Constraint Satisfaction Problems

Chapter 6
Instructor : Miss Mahreen Nasir Butt
Outline
 Games
 Optimal decisions
 Minimax algorithm
 α-β pruning
 Imperfect, real-time decisions
2
Games
 Multi agent environments : any given agent will need to
consider the actions of other agents and how they affect
its own welfare
 The unpredictability of these other agents can introduce
many possible contingencies
 There could be competitive or cooperative environments
 Competitive environments, in which the agent’s goals are in
conflict require adversarial search – these problems are
called as games
3
Games
 In game theory (economics), any multiagent environment
(either cooperative or competitive) is a game provided
that the impact of each agent on the other is significant
 AI games are a specialized kind - deterministic, turn
taking, two-player, zero sum games of perfect information
 In our terminology – deterministic, fully observable
environments with two agents whose actions alternate
and the utility values at the end of the game are always
equal and opposite (+1 and –1)
4
Optimal Decisions in Games
 Consider games with two players (MAX, MIN)
 Initial State
 Board position and identifies the player to move
 Successor Function
 Returns a list of (move, state) pairs; each a legal move and resulting
stateo;
 Terminal Test
 Determines if the game is over (at terminal states)
 Utility Function
 Objective function, payoff function, a numeric value for the terminal
states (+1, -1) or (+192, -192)
 We are not looking for a path, only the next move to make(that hopefully
leads to a wining state)
 Our best move depends on what the other player does
5
Game Tree
 Root node represents the configuration of the
board at which a decision must be made
 Root is labeled a "MAX" node indicating it is
my turn; otherwise it is labeled a "MIN" (your
turn)
 Each level of the tree has nodes that are all
MAX or all MIN
6
Game Trees
 The root of the tree is the initial state
 Next level is all of MAX’s moves
 Next level is all of MIN’s moves
 …
 Example: Tic-Tac-Toe
 Root has 9 blank squares (MAX)
 Level 1 has 8 blank squares (MIN)
 Level 2 has 7 blank squares (MAX)
 …
 Utility function:
 win for X is +1
 win for O is -1
7
Tic-tac-toe: Game tree
(2-player, deterministic, turns)
8
Optimal Strategies
 In a normal search problem, the optimal solution would be
a sequence of moves leading to a goal state - a terminal
state that is a win
 In a game, MIN has something to say about it and
therefore MAX must find a contingent strategy, which
specifies :
 MAX’s move in the initial state,
 then MAX’s moves in the states resulting from every
possible response by MIN
9
Optimal strategies
 Then MAX’s moves in the states resulting
from every possible response by MIN to
those moves
 An optimal strategy leads to outcomes at
least as good as any other
 Strategy when one is playing an infallible
opponent
10
Minimax Strategy
 Basic Idea:
 Choose the move with the highest minimax value

best achievable payoff against best play
 Choose moves that will lead to a win, even though min is trying
to block
 Max’s goal: get to 1
 Min’s goal: get to -1
 Minimax value of a node (backed up value):
 If N is terminal, use the utility value
 If N is a Max move, take max of successors
 If N is a Min move, take min of successors
11
Minimax
 Perfect play for deterministic games
 Idea: choose move to position with highest
minimax value = best achievable payoff against
best play
 E.g., 2-ply game:
A
B
C
D
12
Minimax value
 Given a game tree, the optimal strategy can be
determined by examining the minimax value of
each node (MINIMAX-VALUE(n))
 The minimax value of a node is the utility of being
in the corresponding state, assuming that both
players play optimally from there to the end of
the game
 Given a choice, MAX prefer to move to a state of
maximum value, whereas MIN prefers a state of
minimum value
13
Minimax algorithm
14
Minimax
MINIMAX-VALUE(root) = max(min(3,12,8), min(2,4,6), min(14,5,2))
= max(3,2,2)
=3
A
B
C
D
 The algorithm first recurses down to the tree bottom-left
nodes and uses the Utility function on them to discover that
their values are 3, 12 and 8.
15
Minimax
A
B
C
D
 Then it takes the minimum of these values, 3, and
returns it as the backed-up value of node B.
 Similar process for the other nodes.
16
Properties of minimax
 Complete? Yes (if tree is finite)
 Optimal? Yes (against an optimal opponent)
 Time complexity? O(bm)
 Space complexity? O(bm) (depth-first
exploration)
 For chess, b ≈ 35, m ≈100 for "reasonable"
games
 Exact solution completely infeasible
 )‫استكمال؟ نعم (إذا الشجرة هو محدود‬
)‫األمثل؟ نعم (ضد الخصم األمثل‬
‫تعقيد الوقت؟‬O (BM)
‫الفضاء التعقيد؟‬O (BM) ()‫عمق االستكشاف والعشرين‬
‫ ل≈ "معقولة" ألعاب‬100 ‫ م‬،35 ≈ ‫ ب‬،‫لعبة الشطرنج‬
‫بالضبط تماما الحل غير قابل للتطبيق‬
17
The minimax algorithm: problems
 Problem with minimax search:
 The number of game states it has to examine
is exponential in the number of moves.
 Unfortunately,
the
exponent
can’t
eliminated, but it can be cut in half.
 :‫مشكلة مع مينيماكس البحث‬
.‫عدد الدول اللعبة لديها لدراسة هو األسي في عدد من التحركات‬
be
‫ ولكن يمكن قطع عليه في‬،‫ ال يمكن أن يتم القضاء على األس‬،‫لألسف‬
.‫النصف‬
18
α-β pruning
 It is possible to compute the correct minimax
decision without looking at every node in the
game tree.
 Alpha-beta pruning allows to eliminate large
parts of the tree from consideration, without
influencing the final decision.
 ‫فمن الممكن لحساب القرار الصحيح مينيماكس دون النظر إلى كل عقدة في‬
.‫شجرة لعبة‬
‫ دون‬،‫ألفا بيتا تقليم يسمح للقضاء على أجزاء كبيرة من شجرة من النظر‬
.‫التأثير على القرار النهائي‬
19
α-β pruning
MINIMAX-VALUE(root) = max(min(3,12,8), min(2,x,y),min(14,5,2))
= max(3,min(2,x,y),2)
= max(3,z,2)
where z <=2
=3
20
α-β pruning example
 It can be inferred that the value at the root is at
least 3, because MAX has a choice worth 3.
 ‫ ألن لديه خيار‬،3 ‫ويمكن استنتاج أن القيمة في جذور ما ال يقل عن‬MAX ‫بقيمة‬
.3
21
α-β pruning example
 Therefore, there is no point in looking at the
other successors of C.
 ‫ ال يوجد أي نقطة في النظر إلى غيرها من خلفاء‬،‫لذلك‬C.
22
α-β pruning example
This is still higher than MAX’s best alternative (i.e.,
3), so D’s other successors are explored.
‫هذا ال يزال أعلى من بديل أفضل‬MAX ( ‫ لذلك يتم استكشاف خلفاء‬،)3 ،‫أي‬
D'.‫سلع أخرى‬
23
α-β pruning example
The second successor of D is worth 5, so the
exploration continues.
‫خليفة الثاني من الجدير‬D 5.‫ بحيث يواصل االستكشاف‬،
24
α-β pruning example
 MAX’s decision at the root is to move to B,
giving a value of 3
25
Why is it called α-β?
α = the value of the best
(i.e.,
highest-value)
choice
found so far at any choice
point along the path for max
β = the value of the best
(i.e.,
lowest
value)
choice
found so far along the path for
MIN
If v is worse than α, max will
avoid it
Prune that branch
26
‫‪α = ‬العثور على قيمة أفضل خيار (أي أعلى قيمة) حتى اآلن في أي لحظة‬
‫االختيار على طول مسار ماكس‬
‫= ‪β‬العثور على قيمة أفضل (أي أقل قيمة) االختيار حتى اآلن على طول‬
‫الطريق ل‪MIN‬‬
‫إذا ‪V‬هو أسوأ من ‪ ،α‬والحد األقصى تجنب ذلك‬
‫تقليم ذلك الفرع‬
‫‪27‬‬
Properties of α-β
 Pruning does not affect final result
 Good move ordering improves effectiveness of
pruning
 With
"perfect
ordering,"
time
complexity
=O(bm/2)
 Doubles depth of search
 A simple example of the value of reasoning about
which
computations
are
relevant
(a
form
of
metareasoning)
28
‫‪ ‬التقليم ال يؤثر النتيجة النهائية‬
‫خطوة جيدة يأمر يحسن فعالية التقليم‬
‫مع "الكمال طلب‪ "،‬التعقيد الوقت = )‪O (BM / 2‬‬
‫يضاعف عمق البحث‬
‫وهناك مثال بسيط من قيمة المنطق الحسابية حول أي وثيقة الصلة (شكل‬
‫من أشكال )‪metareasoning‬‬
‫‪29‬‬
MinMax – AlphaBeta Pruning
 MinMax searches entire tree, even if in some cases the rest
can be ignored
 In general, stop evaluating move when find worse than
previously examined move
 Does not benefit the player to play that move, it need not
be evaluated any further.
 Save processing time without affecting final result
 MinMax ‫ حتى لو في بعض الحاالت يمكن تجاهل بقية‬،‫يبحث شجرة بأكملها‬
‫ عندما وقف تقييم الخطوة تجد أسوأ من الخطوة سبق النظر فيها‬،‫بشكل عام‬
.‫ ليس من الضروري أن يتم تقييم أكثر من ذلك‬،‫ال يستفيد الالعب للعب هذا التحرك‬
‫توفيرا للوقت دون التأثير على معالجة النتيجة النهائية‬
30