DP for optimum strategies in games

DP for Optimum Strategies
in Games
J.-S. Roger Jang (張智星)
[email protected]
http://mirlab.org/jang
MIR Lab, CSIE Dept.
National Taiwan University
Outline
Game of dice sum
 Game of colored jenga

2/13
Game of Dice Sum

Description


Toss a dice 8 times and place the value into 4 double-digit
number right after each toss. Find the total of these 4
numbers. If the total is bigger than 150, your score is 0.
Otherwise your score is the total.
Your goal

Find the optimum strategy to play the game such that the
expected total is optimized.
Credit: Peter Norvig at Google
CS283: AI Programming Techniques
(1989 at UC Berkeley)
3/13
Three-step Formula of DP: Step 1

Optimum-value function

D(p, q, s)=expected max score when
p: No. of ten’s position left
 q: No. of one’s position left
 s: current sum of the game

Credit: 電機系賀正翔
Game state of (1, 2, 67)
4/13
Three-step Formula of DP: Steps 2 and 3

Recurrent formula for the optimum-value function
General recurrence :
D p, q, s   1 / 6 * maxD p  1, q, s  10 , D p, q  1, s  1
 1 / 6 * maxD p  1, q, s  20 , D p, q  1, s  2 
 1 / 6 * maxD p  1, q, s  30 , D p, q  1, s  3
 1 / 6 * maxD p  1, q, s  40 , D p, q  1, s  4 
 1 / 6 * maxD p  1, q, s  50 , D p, q  1, s  5
 1 / 6 * maxD p  1, q, s  60 , D p, q  1, s  6 
Boundary condition :
D p, q, s   0 if s  150, p, p... (more to be added)

Answer: D(4, 4, 0)
5/13
Strategy during the Game

Recurrent formula for the optimum-value function
 p, q, s   1,2,100
Given the dice value is 4, our strategy :
pos  D(0, 2, 140)  D(1,1,104)?tens : ones;
6/13
Game of Colored Jenga

Description:


http://codeforces.com/problemset/problem/424/E
Techniques
Dynamic programming
 Hash table

7/13