A Series of Lectures on Approximate Dynamic Programming

A Series of Lectures on
Approximate Dynamic Programming
Dimitri P. Bertsekas
Laboratory for Information and Decision Systems
Massachusetts Institute of Technology
Lucca, Italy
June 2017
Bertsekas (M.I.T.)
Approximate Dynamic Programming
1 / 24
Our Aim
Discuss optimization by Dynamic Programming (DP)
and the use of approximations
Purpose: Computational tractability in a broad variety of practical contexts
Bertsekas (M.I.T.)
Approximate Dynamic Programming
2 / 24
The Scope of these Lectures
After an intoduction to exact DP, we will focus on approximate DP for optimal
control under stochastic uncertainty
The subject is broad with rich variety of theory/math, algorithms, and applications
Applications come from a vast array of areas: control/robotics/planning, operations
research, economics, artificial intelligence, and beyond ...
We will concentrate on control of discrete-time systems with a finite number of
stages (a finite horizon), and the expected value criterion
We will focus mostly on algorithms ... less on theory and modeling
We will not cover:
Infinite horizon problems
Imperfect state information and minimax/game problems
Simulation-based methods: reinforcement learning, neuro-dynamic programming
A series of video lectures on the latter can be found at the author’s web site
Reference: The lectures will follow Chapters 1 and 6 of the author’s book
“Dynamic Programming and Optimal Control," Vol. I, Athena Scientific, 2017
Bertsekas (M.I.T.)
Approximate Dynamic Programming
3 / 24
Lectures Plan
Exact DP
The basic problem formulation
Some examples
The DP algorithm for finite horizon problems with perfect state information
Computational limitations; motivation for approximate DP
Approximate DP - I
Approximation in value space; limited lookahead
Parametric cost approximation, including neural networks
Q-factor approximation, model-free approximate DP
Problem approximation
Approximate DP - II
Simulation-based on-line approximation; rollout and Monte Carlo tree search
Applications in backgammon and AlphaGo
Approximation in policy space
Bertsekas (M.I.T.)
Approximate Dynamic Programming
4 / 24
First Lecture
EXACT DYNAMINC PROGRAMMING
Bertsekas (M.I.T.)
Approximate Dynamic Programming
5 / 24
Outline
1
Basic Problem
2
Some Examples
3
The DP Algorithm
4
Approximation Ideas
Bertsekas (M.I.T.)
Approximate Dynamic Programming
6 / 24
Basic Problem Structure for DP
Discrete-time system
k = 0, 1, . . . , N − 1
xk +1 = fk (xk , uk , wk ),
xk : State; summarizes past information that is relevant for future optimization at
time k
uk : Control; decision to be selected at time k from a given set Uk (xk )
wk : Disturbance; random parameter with distribution P(wk | xk , uk )
For deterministic problems there is no wk
Cost function that is additive over time
(
E
gN (xN ) +
N−1
X
)
gk (xk , uk , wk )
k =0
Perfect state information
The control uk is applied with (exact) knowledge of the state xk
Bertsekas (M.I.T.)
Approximate Dynamic Programming
8 / 24
Optimization over Feedback Policies
wk
uk = µk (xk )
System
xk
xk+1 = fk (xk , uk , wk )
µk
Feedback policies: Rules that specify the control to apply at each possible state xk
that can occur
Major distinction: We minimize over sequences of functions of state
π = {µ0 , µ1 , . . . , µN−1 }, with uk = µk (xk ) ∈ Uk (xk ) - not sequences of controls
{u0 , u1 , . . . , uN−1 }
Cost of a policy π = {µ0 , µ1 , . . . , µN−1 } starting at initial state x0
(
Jπ (x0 ) = E
gN (xN ) +
N−1
X
gk xk , µk (xk ), wk
)
k =0
Optimal cost function:
J ∗ (x0 ) = min Jπ (x0 )
π
Bertsekas (M.I.T.)
Approximate Dynamic Programming
9 / 24
Scope of DP
Any optimization (deterministic, stochastic, minimax, etc) involving a sequence of
decisions fits the framework
A continuous-state example: Linear-quadratic optimal control
Linear discrete-time system: xk +1 = Axk + Buk + wk , k = 0, . . . , N − 1
xk ∈ <n : The state at time k
uk ∈ <m : The control at time k (no constraints in the classical version)
wk ∈ <n : The disturbance at time k (w0 , . . . , wN−1 are independent random
variables with given distribution)
Quadratic Cost Function
(
E
xN0 QxN +
N−1
X
xk0 Qxk + uk0 Ruk
)
k =0
where Q and R are positive definite symmetric matrices
Bertsekas (M.I.T.)
Approximate Dynamic Programming
11 / 24
22p2n
2n
11nn
2(n
1)
1)
(n C1)(n
1)
(n p1)n
p22
p2(n
p2(n
pC
p(n
SB / C
C
C1)
CCD
C
C
1)
(nBC
1)(n
1) CAD
1)n
AB
AC
CA
CB
2ndSAGame
Timid
Play
2nd
Game
/2(n
Bold
Play
ACB
ACD
CAB
CAB
CC
CCAB
CDA
C
CCD
Cnn
AD
DB
AB
AD
CD
BD
DB
AB
+
1 1)CCBD
pat
p2n CkpCCD
pDA
p(nCDA
wk xk uk Demandp at Period
k Stock
k2(n
at Period
22 Period
2(n
1) p(n
1)n pnn
p2(n
pnn
02nd
0Stock
1 p1.5
0.52nd
11)(n
1 1)0.5
1.5
0 2
22 p2n p2(n 1) 0
/ Timid
Game
/ Bold
Play
1) 1 pGame
(n0 1)(n
1)
(nPlay
1)n
12
Cost of Period k Stock Ordered at Period
Inventory
System
CAB
C0DA
C1st
DokScheduling
not
Repair
10 20CnBD
n
p11
pC12
p.1(nPlay
k+1
DB
02 C
1/CD
1 1 1.5
0.5
1ABp/1n
1Bold
0.5
1.5
1)
Example
1st
Game
Timid
Game
3C
5AD
6Repair
2nd
Game
/4nGame
2nd
Bold
Play
pTimid
pC12n
pPlay
pPlay
2nd
/22CD
Play
2nd
Game
/p11)
Bold
Play
r(uk )Discrete-State
+ cuk xk+1 = xk + u +Deterministic
k w
CGame
C
CGame
C
C
22
1)(n
k not
ABRepair
DA
BD
DB
AB
Do
Repair
1C1st
p2(n
p1)12
p2(n
1st
/ AD
Timid
Play
Play
pGame
ppC
1 1)pC.. p(n 1)
11
1n
STimid
Sn/ Bold
C
C
C /pp(n
C.
wBC
1) w
2(n
dk 1(n
d
A
B Play
AB
AC
CD 1)at
Cost
Ordered
at1)pPe
Game
Timid
2nd
Game
/CA
Bold
Play
Demand
kp1(n
Period
2 Do
3 not
21(..
Do/not
Repair
2Timid
1of
nPeriod
p111
pGame
pStock
k Repair
Repair
2at
nPeriod
11n
p Stock
p12pU
p11n
12
1)
2(n
problem approximation
u1kSystem
u2nd
ukRepair
uw4kkGame
u/15kx=
Relaxation
U
2nd Game
Timid
Play
/unBold
Game
/kConstraint
/µnBold
1x
wk xk uTail
Stock
at
d kU
k 1st
k2nd
fPeriod
(x
,Play
uxPlay
w1st
(x11ku)Play
w
k Demand at Period k Stock at Period
kk+1
k )(x
ku=
kk
kµ p(x
r(u
cuxkkk+1
xu
+
wk=
p22Period
pk2n
pk2(n
p,=
pµnn
+
1xpkk+1
Cost
of Period k Stock Ordered at
Inventory
System
µ
k) +
2(n
1)
1)System
(nf=
1)ku,pw
(n
k1)(n
k ,+
k )11)n
k
k k.)
0
0
1
0
0
1
1.5
0.5
1
0.5
1.5
10
5
7
8
3
9
6
1
2
wk at
x Period
u Demand
at Period
kp02n
at01st
Period
kCA
Stock
at
Period
/ Timid
Play
Bold
Play
pdpC1n
1/ Bold
Do
not
Repair
Repair
1pTimid
2Game
nDAGame
n2/2nd
pBold
p12
k + 1Stock
k + 1 Initial
State
AStock
CD
ABC
w1p
d1
11
Game
/1(nTimid
Play
Play
pdp1(n
/1st
Play
Pl
pGame
p0.5
0 kDo
1 kpC2(n
0 AB
1AC
1.5
1Game
0.5
0p1/nn
22
C1.5
C
CGame
C..pAB
1)1st
2(n
1) 2nd
1)(n
1)
AB
AD
r(u
) k+ cukk xk+1 = xk + up+
w
Repair
n 1C
1Play
np(np1st
p12
pCD
p2(n
111)n
1n
1(nBD
1st
/C2Timid
1st
/ Bold
Play
pDB
1pnn
pd
pnot
pRepair
p02(n
ppnn
d 1)
p
pGame
pDemand
p1(nk1)n
p(n
22
2n pPlay
2(n
1)Game
1)
(n
1)
(n
22
2n
k+1 k
2(n
1)1)(n
2(n
1)
1)
1st Game
/ Timid
1st
/pBold
Play
ppdGame
p1)(n
11)
pw1)n
w
x
u
at
Period
Stock
at
Period
k
Sto
d 1pw
0
1
0
0
1
1.5
0.5
1
0.5
1.5
0
2
k
k
k
3 5Game
2 4Cost
Period
k+
1Bold
Initial
A Ck
2nd
/ 6Timid
Play
/ Ordered
PlayState
Period
k2 Game
Stock
at Period
32of
5Stock
2 4 at
62nd
kp=
0+0f1kp0kPlay
1kInventory
0,System
1upSystem
1.5
0.5
1k (x
1,wu0.5
1.5
01.5
Cost
Period CAB
k Stock
atxk+1
Period
ACBof ACD
CADOrdered
CDA
x=/k+1
=xpABC
f)0.5
,w
) u1)n
p2µnn
kGame
k
k (xk2) µ
Timid
2nd
Game
Play
System
(x
,p
w
)p0kCA
µ
(x
µ
xwkp0.5
22
2n
0r(u
1uk2(n
00kDo
11)(n
1.5
1nn
1st
/Bold
Timid
Play
/=Bold
2(n
1)
(n
1)(n
k)
kx
k
kk+
+
=1)
+
u
kkkp1)
Stock
at Period
k + 1k 2nd
Initial
State
AB
AC
CD
pGame
p2n/Game
pA
pat
k+1
k(n
22at
2(n
1)cu
(n
(n
1)n
01)
02(n
1Play
01)Game
0/1not
1Game
1.5
0.5
112Play
11st
1.5
0 p0112 Play
Repair
Repair
20.5
1 n
p
2nd
/C1.5
Timid
2nd
/k,kpuBold
2nd
Game
Timid
Play
2nd
Game
/nµBold
Play
wCost
xkk of
u
Demand
at u
Period
Stock
Period
k1 Stock
Period
Period
k+
Stock
Ordered
atw
Period
k
Inventory
System
kcu
r(uk ) +
xkk+1
=
x
+
k
w
0
0
1
0
0
0.5
1
0.5
1.5
0
k
k
System
x
=
f
(x
,
w
)
u
=
(x
)
µ
wk12Px
k+1
k k k kStock
k 1
k
k kk Stock
uk5 Demand
Period
at
At atState
x
Finite
Problems
k x
k Period
wkkw+x1k xuk uDemand
Period
k kStockk atStock
Periodat
k Period
Stock
at
CAB
1st
Game
Timid
Play
pdCDA
1k pat
10
7/Horizon
8 10
3 atPlay
2 9Ch.
59ACB
76 1st
81ACD
3Game
6/ Bold
1 Period
2CAD
d p
Period
kCost
Stock
at
Period
k k) +
kk Demand
r(u
xCk+1 = C
xat
uC
+k w
k +
of
Period
kBold
Ordered
at
System
xCk+1
=5fk2(xPlay
, u6kStock
, 2nd
w
upkd=1µ/k Bold
(x
µ
SAk cu
SB
CkGame
CCB
k+1
k2+C
1BC
k4
k ) Game
k )pPeriod
k1 wkpkwxkInven
AB
AC
CA 1st
CD
CD
/
Timid
Play
1st
Game
/
Play
p
3
2
3
5
4
6
2
w
d
2nd
Game
/
Timid
Play
Stock
at
Period
k
+
1
Initial
State
A
C
AB
ACB
ACD
CAB
CAD
CDA
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
xA
0
0
1
0
0
1
1.5
0.5
1
1
0.5
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
k+1
k+1
k
kpku
k
kp
k 1)
k(xk1)(n
k
kw
kx
System
x
=
f
(x
,
u
,
w
)
u
=
µ
)
µ
p
p
p
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
k+1
k
k
k
k
k
k
k
k
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1p1.5
22
2n
r(u
)
+
cu
x
=
x
+
+
k
w
w
w
d
d
2(n
1)
2(n
(n
1)
(n
1)n
d
k
k
k+1
k
k
at
Period
k
+
1
Initial
State
A
C
AB
AC
CA
CD
ABC
wk Stock
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
System
xk+1 =
k ofk Period k Stock Ordered
k (xk , u3k , System
5wk2) 4uk 6= 2µk (xk ) µk wk xk
Cost
at Period
k fInventory
Deterministic
Problems
Ch.
SB 0.5CAB
CBC
0 System
0AC
1 CA
0 0 CD
1SA
1.5
1 12CAC
0.5 C1.5
2
CA 0 CCD
Stock
at
Period
k
+
1
Initial
State
A
C
AB
ABC
of
Period
k
Stock
Ordered
at
Period
k
Inventory
k +Cost
1r(ukC
)+
cuk CxAD
xkDA
+ u +CkCDwk C0BD
33 1of
59 1.5
2Period
4 160.5
2k 1 Stock
k+1 = C
Cost
Ordered
k Inventory Sy
AB
0at
01st
17ACD
0.5
1.5
0at
2Period
5C1DB
7C0CD
8C
2ACB
schedule
10
3
6Play
11Bold
2Ch.
CAB
C
CAC 10C1st
C
/AB
Timid
/11st
Bold
1A
p0.5
pw=
pwkpCA
Timid
Play
Game
/pu
Play
p/d1µBold
Period
kFinite
Inventory
System
AB
CA0Game
BC
CD
r(uk ) +Cost
cuk xof
=SxAEmpty
ukB+ kStock
w Ordered
dk ,02nd
d1uAB
3Stock
4System
6Finite
3C
50650Play
2C
40CB
62nd
25Game
f9Play
(x
,CAD
w
k+1Period
k +S
Horizon
Game
/w
Timid
Game
Pla
0Game
12/=
18
1=
0.5
2)C
k+1
kProblems
k
k
k10(x
11st
020.5
0k3xk+
1.5
0.5
1kCDA
1.5
2)d
at
Initial
State
AC
11.5
ACB
ACD
CABk kCAD
3 5 CDA
2 Period
4 6k )02+k1cuStock
r(u
xPeriod
uProblems
+
10
51.5
7+ 8
6
1at
2Period
k xk+1
k
k Ch.
w
Period
Stock
at
atHorizon
Period
k x
k uk Demand
w
xk AC
uk System
Demand
at
Period
kk , 9u
Stock
k(x
Stock
atwkPeriod
r(u
xk+1
=ACD
xatkk+
u
+
k
w
k AB
C
C
C
C
CxDB
x
=
f
(x
,
w
)
u
=
µ
)
µCkBD
k ) + cu
k ACB
k
AB
AD
DA
CD
k+1
k
k
k
k
k
k
k
Stochastic
Problems
Stock
at
Period
+
1
Initial
State
A
C
CA
CD
ABC
CAB Ordered
CAD CDA
Cost of Period k Stock
at Period10
k 5Inventory
System
.
7
8
3
9
6
1
2
k+1
. 0.5
fCD
,p
u1.5
)0.5
= 1)µ
(x
wC
+
1 CAC
Stock at Period
k+1 C
Initial
State
AB
ABC
k
kC
k, w
k
kB1
k ) Cµ
k xk2 CCD
S2(n
C
CAB
C
C12
not Repair
Repair
1 A
2 kkC
nSystem
pk+1
p=
pu
0x
1CA
0DB
0p(x
1k66
AB
BC2 Play
ADu +CkDA
CD10 n
BD
AB
11
1n
1)
10
7S3kACD
8x5Deterministic
9C
2kAC
111
0CD
0A8
132w
1.5
0.5
1,k1.5
Ch.
1st
Game
/.(xTimid
Play
1st
Bold
r(uk ) + Do
cu
=
=
f51(n
(x
,u
)4
=
(x
w
10
5
3
2
System
=
fu6
,µuProblems
w
)1C0µCA
uk0.5
=Game
µk CD
wCk CB
x
211kCAD
ACB
CAB
k
k9
k
kµ21.5
k k/)0CA
kk
k
kx(x
SAk xSk+1
C xk + C
Cw
C0at
C27
10 C
5 CD
7System
8CBC
3 0x9k+1
6CB
Stock
Period
k
+kk+1
1,Problems
Initial
State
A
AB
AC
Deterministic
Ch.
2kk)CDA
B
AC
CA
ACB
ACDAB
CAD State
CDA
Finite
StockS
at Period
k CAB
+ 1 Initial
A
CHorizon
AB
AC
CA
CD
3Problems
5Perfect-State
2 4CFinite
6Ch.
2 1ABC
Horizon
Problems
Ch.
1
S
C
C
C
C
C
C
Info
Ch.
3
A
B
AB
AC
CA
CD
BC
CB
CD
Do
not
Repair
Repair
1
2
n
1
n
p11
ACD kCAB
CAD
CDA at3 Period
Cost ACB
of Period
Stock
Ordered
kof
Inventory
System
k+ℓ−1
!
Finite
Horizon
Problems
Ch.
1
5
2
4
6
2
Cost
Period
k
Stock
Ordered
at
Period
k
Inventory
System
.
"
C
#)k 1)..uCµCD
$ 0.5
p22 at
p2n
p2(n k1)+ 1p2(n
p(n 1)(n
pf2nn
x
=
(x
u
,4w
)0C2AD
u1C
µProblems
(x
w
xk)1 C
AB
DA
BD
DB
1) State
1)n
k+1
k3
k=
k6ABC
k =
k
k0.5
Stochastic
0,C
A
CDB
AC
CA
CD
31)
5CpAB
2ACB
6CD
5,kup,w
2+
System
x+k+1
fAB
=1CD
µC
(x
wkC
x
S
C
C
C
C1.5
Do
Repair
1 System
2BC
n
1(n
n
pStochastic
pProblems
˜ AB
k (x
k
k1)
k1.5
k2
kBC µ1
kCCB
kCD
Apu
B1n
AC
CA ,6
r(uk ) + cuk Stock
xk+1
xC
+
u+
kAC Initial
wC
11Sx
12
CSAB
CRepair
C
C
C
10
5
8pk2(n
3k0CDA
1(n
1), u
CAB
CAD
SPeriod
Cnot
C
CkCA
Cr(u
C
kAD
DA
CD
BD
A =
B
AB
CD
min
g4Finite
(x
)0kC
+
gw
x
(x
+
cu
xk+1
=
w7
Horizon
Ch.
19
m µm
m ), wm + J k+ℓ (x
k )C
kECB
k
kAB
k ,ACD
Deterministic
Problems
Ch.
ACD
CAB
CDA
10
5k k7 Horizon
8kProblems
32 9Problems
6 1 k2 Ch.
uCAD
,µ
,...,µ
SA ACB
SBCAB
CAB
CCA
C
CCBCDB
CCD C
AC
CD
BC
CCAD
C
C
CBD
Finite
1 p2(n
kC
k+1
k+ℓ−1
DA
CDFinite
AB
p
p
p
p
p(n
Deterministic
Problems
Ch.
2
22
2n
2(n
1)
1)
(n
1)(n
1)
Horizon
Problems
Ch.
1
Finite
Horizon
Problems
Ch.
1
m=k+1
Problems Ch. 2
10
5 5/7Bold
84 3
92 6 Deterministic
1 2
2nd
Game
/
Timid
Play
2nd
Game
Play
3
2
6
CAB ACD
Cp
CCD
CBD
C1)DBAB
CAB
Info
Ch.
System
=
fkC
(x
u23k1C
, wkC) 1 uC
µk (x
CAB
CAD
10
5 371)CA
9 6C
6
1Perfect-State
DA p2(n
10
7
82
3CxCA
9k+1
2BD
C
CState
C
kC
k =
not
Repair
Repair
1, BC
n
p11
p12k ) pµ
p(n
pAD
1
DA
CD
Stock atACB
Period
kAD
+ 1pC2n
Initial
State
ACDA
C
AC
ABC
22
5Perfect-State
23AB
41)n
2nn
1) p
S8pB
C
C
C
C
..A6
AC
CDAB
CB AB
CD ABC
atSAPeriod
kCD
+
1 5Do
Initial
C
ACnDB
CA
CD
Problems
Ch.
21
CAB
CSAD
CAB
Ch.
notCC
Repair
Repair
1CDB
22(n
1Stock
nBC
p1)(n
pDeterministic
p(n
pAB
p2(n
Stochastic
DA
CD
SA Do
CCA
CnCD
C
C
C
11Problems
12
1n
1(n 1) Info
1) . 3
B
AB CC
AC CBD
CB
CD
Deterministic
Problems
2
... Ch.
2nd Game
/ Timid
Play 2nd Game / Bold
Stochastic
Problems
Deterministic
Problems
Ch.
2 ..1) Problems
Do
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
Stochastic
Problems
11
12
1n
1(n
2(n
1)
Deterministic
Ch.
2
Finite
Horizon
Problems
Ch.
1
1st
/ Timid
Play
1st
Game
/ Bold
p3.2
1 2 pp4w 6 Ch.
10
5
7Play
8
3pd 91Play
6. ppd1
wProblems
Finite
1 s 1) ∗p(n 1)(n 1) p(n 1)n p
DoSGame
not Repair
Repair
1C
2CA
n Play
1CCD
n 2nd
p11
pCB
pHorizon
52n
CAB
C
CpGame
12 p1nC
1(nC1)C
2(n
1)
B
AC
BC
p=
2nd
Game
/ Timid
/Stochastic
Bold
22
ACBSAACD
CAB
CAD
CDA
2(n
1)
2(n
CACD
CCD
CnDB
{Φr
| rBD
} x1 C
AB
DA
10
5
71)CD
3S
6CDA
1C
212∈p2ℜ
ACB
CAB
CAD
Problems
Do
not
Repair
Repair
nx̃AB
p11 p12 p1n p1(n
Finite
Problems
Ch.
19
p22
pC
pCD
pp(n
pAD
DoCnot
Repair
1 2C
n 11)C
n BD
p1)(n
pHorizon
pSubspace
Perfect-State
Info
Ch.
3..8C
CpAD
C
Cp1(n
2n Repair
nn
11
12DB
1n
2(n
2(n
2(n
1) p
(n 1)
1)n
AB
DA1)
AB
Stochastic
Problems
Finite
Horizon
Problems
Ch.
1Problems
1st
Game
/
Timid
Finite
Horizon
Ch. 13Play 1st 1Game / Bold P
Stochastic
Problems
p
p
p
p
p
p
p
Perfect-State
Info
Ch.
3
Perfect-State
Info
Ch.
22
2n
nn
2(n
1)
2(n 1) 1)(n1
(n1) 1)(n
1)
(n
1)n
p22 0p2n
p2(n
p(n0.5
1
Stochastic
Problems
Deterministic
Problems
Ch.
2 / Bold Play
nn Deterministic
0
1
0
0
1
0
2
1) p1
2(n 1.5
1) p(n0.5
1)n p1.5
Problems
Ch.
2
10
3 pw
9C
6 must
1 2Game
CAD
CDA
C
CB,
CD
CDCCD
BDA,
AB
Play
2nd
1st
Play
1st
Game
/ Bold
Play
p2nd
1 Game
p5d Cp7/3wTimid
SA psequence
SCBAB
CAC
C/pCTimid
C
C
d Info
Find optimal
operations
C,
(A
must
precede
B81 and
precede
D)
AB
BC
CB
CD
p2nd
pGame
pC
22
2nCp
nn
SDB
SBDo
C
C1BC
C1)(n
Perfect-State
2(n
1) of
1) CA
(n
1)(n 2nd
1) p
(n
1)n
pCh.
pC
pn
..pCD
A/
ABp
CA Ch.
CB 1)CCD
22CAC
2n Repair
Deterministic
Problems
2Ch.
Game
/2(n
Timid
Play
Game
Bold
Play
2(n
1)
(n p
(n 1)n
not
Repair
22(nn31)
pp1n
p pnn p
11 p12
Finite
Horizon
Problems
1 12Problems
Perfect-State
Info
Ch.
Do not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
Deterministic
Problems
Ch.
0
0
1
0
0
1
1.5
0.5 11(n 11) 0.52(n
Deterministic
Ch.
2
11
12
1n
1(n
1)
2(n
1)
2nd
/ Timid
Play Play
2nd Game
/Game
Bold Play
Perfect-State
InfoPlay
Ch. 3
2ndGame
Game
/=Timid
2nd
/(x
Bold
1
Finite
Horizon
Problems
Ch.
1
System
x
f
(x
,
u
,
w
)
u
=
µ
)
µ
w
x
.. Timid
k / Bold
k withk fixed
k
k
k Parametric
k1st
Perfect-State
Info /Ch.
3 Play 1st Game
Problems
Stochastic
Problems
Rollout:
policy
approximation
the
endPlay
Mon
/C
Timid
Game
Play
Game
/ Bold
pd
0 k+1
0Play
1 k2nd
0 CkSimulation
0BD
1 nCk1.5
1 C12
1 p0.5
1.5
0Stochastic
CAB2nd
CGame
CRepair
CAB
AD
DA
CD
DB
1 / at
Do
notGame
Repair
1 21st
1 n0.5
pBold
pp1(n
pp22(n
C
C
C11)
CAB
11 p
Game
/pw2(n
Timid
Game
Bold1nn
Play
AD 1nC
CD
BD.pPlay
DB 2nd
DP Problem
Formulation
1)
Stochastic
Problems
1st
/ Timid
Play
Game
/AB
p2nd
12(n
p1)
ppPlay
pC
dDA
d Ch.
22 p2n
1) w
(n 1)(n 1) p(n 1)n p
Deterministic
2System
Play 1)1stpGame
/ Bold
Play
pProblems
pwFinite
w 1
p22 1st
p1st
p2(n/ Timid
p2(n
p(n
d 1 pProblems
d pStochastic
Horizon
Problems
xk+1
, uk ,1wk ) uk = µk (xk
Problems
2nGame
nn
1)
(n1st
1)(n
1) Stochastic
1)n
1= fk (xkCh.
Game
/
Timid
Play
Game
/
Bold
Play
p
1
p
p
1
p
w
w
d
d
1st Game / Timid Play 1st Game / Bold Play pd 1 pDeterministic
p w 1 pw
Info
Perfect-State
Info
Ch.1 30.5 1 1 0.5 1.5 .0
0.. xdecisions
1k Ch.
0 30Ch.
12 1.5
System xk+1 = Controls:
fpk (xk , uk , wk )Stage
uk =d µPerfect-State
(x
) µand
wProblems
States:
Partial
schedules;
0,
kp
k1,
k0
k12
p0.5
nn
1st
Timid
/ Bold
Play pd 1 .pd
1) 0.5
1)
(n
1)n
Do notp22
Repair
211.5n1.5
1 (n0.5
p111
p112 1.5
p1n2nd
p1.5
. n 2nd
Perfect-State
Info
00p2n00 1pRepair
12(n0 001)0 11p2(n
1 0.5
0Ch.
232(n
Game
/pGame
Timid
Game
/12Bold
Play
1n 1)(n
0Repair
21(n
1)
1)
Do
not
Repair
1/Deterministic
2Play
1Play
nCh.
p1st
pGame
p1(n
Stochastic
Problems
11
1p1nCh.
(λ)
(c)
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
Problems
2 1) p2(n 1) .
Perfect-State
Info
Ch.
3
Perfect-State
Info
3
T
(x)
=
T
(x)
x
=
P
(x)
0 0 the
0.5 1.5
1 into
0 0 10 0 00 1 1 1.5
0.51 11.51 0.5
01 20.5 1.5 0 2
DP idea:
Break
down
problem
smaller
pieces
(tail
subproblems)
Finite Horizon
Problems
Ch. 1
Stochastic
Problems
System
x
=
f
(x
,
u
,
w
)
u
= µk (xk ) µk
k+1
k
k
k
k
k
x/k+1
= fk (x
, uk,, 2nd
w
uk)=pu
µk/(x
w)kGame
2ndSystem
Game
Timid
Play
Bold
k(x
k )µ µ(x
0Timid
0 1st
1 1.5
0.5
1 Play
1 0.5pd 1.5
p22 p2n
pSystem
pkPlay
ukk ), wGame
=
µxkk01)w/kCh.
x1k 0Play
1st
Game
/ Bold
1 p0 p2 1
nn
k+1
k
kp22
kp
2(n 1) xp
2(n =1)fkp
(nk 1)(n
1)
(n
1)n
pk2n
Perfect-State
2(nInfo
1) p(n 11)n pnn
System
xk+1 last
=
fkx(xdecision
uk =
µk (xk/) Bold
1st Game
/ Timid
Game
ppd2(np3w1)Stochastic
1p(nFinite
p 1)(n Problems
1 1 d w
k , uPlay
k , wk ) 1st
kµ x(x
k pd 1
Start from
the
and
System
uµkkbackwards
=wPlay
Horizon Problems Ch.
k+1 = fk (xk , u
k , wk )go
k k ) µk w k xk
1 w
Deterministic Problems Ch. 2
1
1
Perfect-State
Info
3
Problems
1Play
1st Game
/Finite
Timid
Play
1st
Game
/ Ch.
Bold
1 0pd0 pxwk+1
11.5
pfwk (x
=Ch.
)0.5
uk =1.5
µk (x
k , w1kPlay
2nd Game
/ Timid
PlayHorizon
2nd Game
/ Bold
Play
0 p0d 1System
12nd
0.5k ,/u1Bold
0 k ) 2 µk w k xk
Game
0 0 1 0 0 1 1.5 0.5 1 1 2nd
0.5 Game
1.5 /0 Timid
2 Play Perfect-State
Info Ch.
3
Deterministic
Problems
Ch. 2
Bertsekas (M.I.T.)
Approximate Dynamic Programming
12 / 24
1
1st
Game
/Timid
Timid
1st
Game
/k1(n
Play
ppxd1)
p
2nd
Game
Play
Bold
Play
xBD
=Repair
f2(n
(x
,1kRepair
ukpn
,(n
wPlay
) 1u
=n
(xpp1)n
µp11
not
Repair
pk11
p2nd
pGame
x
u
Demand
Period
at
k
1n
k+1
k/n
kat
k1
knn
k Period
k 1.p1(n
1)
2(n
k2
k
CABSA CDo
C
C
C
C(n
p22pSystem
pCD
pC
pCD
pµ
Do
not
2(n
nk/)Bold
pStock
ppw
AD
DB
AB
2n
12
1n
1)
1)
1)(n
1)
(n
pDA
pRepair
p1w
pkC
p12
SB / C
C
C
C1)
C
C
C
22
2n
nn
2(n
2(n
1)
1)(n
1) CAD
1)n
AB
AC
CA
BC
CB
2nd Game
Timid
Play
2nd
Game
/ACD
Bold
Play
ACB
CAB
CAB
C
C2(n
CC
C
CDA
CBD
CDB
C
CCD
C
2nd
Game
Timid
Play
2nd
Game
Bold
Play
AD
AB
AB
AD
CD
BD C/
DB
AB
kpC/CD
+
1C1)DA
pat
p2n
pDA
p(nCDA
pnn
wk xk uk Demandp at Period
k Stock
k2(n
Stock
atCPeriod
22 Period
C
C
C
C
2(n
1) p(n
1)(n 1)
1)n
AB
AD
CD
BD
DB
AB
.
p
p
p
p
p
p
Scheduling
Example
Algorithm
I
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
22
2n
nn
2nd
Game
/System
Timid
2(n
1)
(n 1)(n
1)
(nPlay
1)n 2nd Game / Bold Play
ost of Period k Stock Ordered
k0not
Inventory
0 at02(nPeriod
1 1)
0Do
1Repair
1.5
0.5
11/CD
0.5
1.5
2121ABp/1n
C0DA
C1st
101 20C
nBD
p011
pC
p.1(nPlay
k+1
AB
DB
02 C
1 1 1.5
0.5
1Bold
0.5
211) .. p
1) p0
2(n
1st
Game
Timid
Play
Game
pdpnn
3C
5AD
6Repair
p22CAD
pC2n
pGame
p22
ppPlay
p/n
p1)(n
nnPlay
.. p(n1.5
2(n
1)C
1)
(n
1)(n
1)
(n
1)n
2nd
Game
/42(n
Timid
2nd
Game
/
Bold
pTimid
p
p
p
2nd
/22CD
Play
2nd
Game
Bold
Play
+ cuk xk+1 = xk + u + k w
CGame
C
C
C
C
2n
.
2(n
1)
2(n
1)
(n
1)
1)n
k not
ABRepair
DA
BD
DB
AB
Do
Repair
1
n
1
n
p
p
p
p
p
1st
/ Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
p
11
12 C1n
S/Timid
SBPlay
CAB
C//CA
Cp
C. CB
CCD
w
wBC
1(n
1) C
2(n
d
AC
CD 1)at
1st
/Repair
1st
Game
Play
1pPeriod
pkw
Cost
kPeriod
Ordered
at1)
2ndGame
Game
Play
2nd
Game
Bold
Play
wkGame
xTimid
Demand
kp1(n
Stock
Period
Do/not
Repair
Repair
1A
nBold
1of
nPeriod
p111
pd12
pStock
. pkd1)Stock
k Repair
Do
not
2at
n
11n/Bold
nBold
p11
p12p2(n
pd1n
p2(n1I
1)
1(n
2nd Game
Timid
Play
/u
Play
1st
Game
/k2Timid
wk xk uk Demand at Period
k Stock
at Period
k 2nd
Stock
at
Period
d k1pxkpd pw p 1
,Play
u=
w1st
)(x
u,k+
µ
)Play
w
Do
2k=
nkxGame
1=
pk=
k)(x
kp
k
kµpp1n
11
12
1(n
1)
2(n
) fk+
cuuxkkkk+1
xkk+1
u,n
+
kk)(x
w
pRepair
p1,=
pw
pµnn
1xpRepair
= pkfnot
,0 r(u
uInventory
,0=
µ
)
µ
w
x
Cost of Period k StockSystem
Orderedxk+1
atpSystem
kk+1
u
u
(x
)
µ
w
xk
kw
22Period
2n
2(n
1)
2(n
1)System
(nkf(x
1)(n
1)
(n
1)n
k+(x
k
kSystem
k
k
k
k
k
k
k
k
k
k
k
.1.5
13ABC
062Game
10
5
81)(n
900.5
11 /Game
21.5
2nd
/2(n
Timid
Play
2nd
Bold
Play
wk at
x Period
u Demand
at Period
kp02n
at01st
Period
kCA
Stock
at
Period
/ Timid
Play
Bold
Play
pd1pC1n
1/0.5
p2
Do
not
Repair
Repair
1pGame
n
n2/2nd
p0.5
p112
tock
k + 1 Initial
State
AStock
AB
CD
w1)
w 11)
d1
Game
/17(n
Timid
Play
Bold
Play
pdp1(n
p 10pp2(n
/1st
Play
Bold
Play
pGame
p0.5
0 kDo
1 kpC2(n
0Game
1AC
1.5
1Game
0p1/nn
22
C1.5
C
CGame
C..pAB
1)1st
1) 2nd
1)
(n
AB
AD
DA
BD
r(u
) k+ cukk xk+1 = xk + up+
w
Repair
np 1C
1Play
npTimid
p1st
pGame
pCD
p11
p2(n
111)n
12
1n
1(n
1)
Game
/0C2Timid
1st
/ 1Bold
Play
pDB
1pnn
pd dpw w1 p
pnot
pRepair
p
p
p
01st
0
1
0
1
1.5
0.5
1
0.5
1.5
2
d 01)
p
p
p
p
p
22
2n pPlay
nn
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
22
2n
1 k
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
1st Game
/ Timid
1st
Game
/
Bold
Play
p
1
p
p
1
p
w
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Peri
w
w
1k 20 0 1 1.5 d0.5 1 d 1 0.5 1.5 0 2
k02 k4
3p222
5Game
p2n0Cost
pStock
p(nState
pnnCk AB
Stock
at
Period
+
1Bold
Initial
A
AC
/p6Timid
Play
/ Ordered
Period
k2 Game
at1)n
Period
Inventor
3 of
51)
2 p42(n
62nd
2(n
1)
(n k 1)(n
1) Play
3 5at2Period
4k 0+
62nd
1/0kPlay
1kInventory
0,System
1upSystem
1.5
0.5
1k (x
1,wu0.5
1.5
011.5
Cost
Period CAB
k Stock
ACBof ACD
CADOrdered
CDA
x=/k+1
=xpABC
f)0.5
,w
)pdu1)n
1st
Timid
Play
1st
Game
/+
Bold
p0dk (xpPlay
pA
p1uk2(n
p2µ
k
kGame
k
kw
2nd
Game
/Game
Timid
2nd
Game
wk
System
x2n
=
(x
,2(n
w
)p0kCA
µ
(x
µ
xwkp0.5
220fCkp
2n
0r(u
00kcu
11)(n
1.5
1nn
2) 1µkppw
1st
/Bold
Timid
Play
/=Bold
1 xp
1)
(n
1)(n
k+1
k)
k
k
kPlay
kk+
+
x2(n
=1)
u
kkPlay
Stock
at
Period
k
+
1
Initial
State
AB
AC
CD
p
p
p
p
p1)
k+1
kp
k(n
22
2(n
1)
1)
(n
(n
1)n
0
0
1
01)Game
0/1not
1Game
1.5
0.5
112Play
11st
1.5
0µnn
Do
Repair
Repair
20.5
nkBold
1k )n
pk112wpk12 xpk1nd p1(n
2nd
/System
Timid
Play
2nd
/
Bold
2nd
Game
Timid
Play
2nd
Game
/
Play
wCost
xkk of
u
Demand
at u
Period
Stock
k1 Stock
at
Period
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
Period
Stock
Period
k
Inventory
System
kcu
+
xkk+1
= xk k+
+ k 0k Ordered
w
0atGame
0
1.5
0.5
1
0.5
1.5
0
k+1
k
k
k
k
k
k0 1 at Period
System
xk+1
=Period
fk (xk ,kukStock
, Ch.
wk ) 1at
ukPeriod
= µk (xkk ) Stock
µk wat
xk
w
uk5
Demand
Finite
Problems
k x
k Period
uk uDemand
at Period
k Stockk atStock
Periodat
k Period
Stock
at
CAB
Game
Timid
1st
Play
pPlay
1 pdk pPeriod
pw
10
76/Horizon
81/10
32atPlay
2
59ACB
76Play
81ACD
3Game
9 Game
6/ Bold
1 at
2/CAD
w 1
dCDA
2nd
Game
Timid
2nd
Bold
Period
k
Stock
at
Period
k cukk Demand
)x+
=C
xat
uC+ k w
10 +5C
7 2System
81st
3
9
k +
Cost
of
Period
kBold
Ordered
Period
=1.5
(xPlay
, u6kStock
,12nd
w
upk0.5
=1µ/k1.5
(x
µ
SB xCk+1
CkGame
k+1
k ) 1Game
k )p0
k12wkpkwxkInventory Syste
A
AB
AC
CA 1st
CD
CB
CD
/1BC
Timid
Play
1st
Game
/kk4
Play
pState
3x
5f0=k=
2at
2u
3CAD
5 k20
4CDA
60 2nd
1C
0 0xC
1Stock
0.5
w
dInitial
dµ
Game
/
Timid
Bold
Play
Period
k
+
1
A
C
AB
AC
ACB
ACD
CAB
System
x
f
(x
,
u
,
w
)
u
=
(x
)
µ
w
x
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0 pCA
2 1C
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
k+1
k
k
k
k
k
k
k
k
System
f
(x
,
,
w
)
u
=
µ
(x
)
µ
w
x
p
p
p
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
1
p
p
1
p
k+1
k
ku
kµ
kp
kx1)pd/pk(n
k1)(n
k1)
kd(n 1
k
1st
Game
/
Timid
Play
1st
Game
Bold
Play
22
2n
r(u
)
+
cu
x
=
x
+
+
k
w
w p
w kpdpnn
d
2(n
1)
2(n
1)n
w
k
k
k+1
k
k
Stock
at
Period
k
+
1
Initial
State
A
C
AB
AC
CA
CD
ABC
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
w
3
5
2
4
6
2
k
k
k+1 k Inventory
k k 3k System
k
k
k
5 k2 4 k 6 2 k k
Cost of Period k Stock Ordered at Period
Deterministic
Problems
Ch.
SB 0.51st
CAB
C1.5
0 1st
0AC
1 CA
0 /0Timid
1SA
1.5
1 Game
12CAC
0.5
2p C
CA 0 CCD
BCp CpCB 1
1 Initial
State
C 3AB
ABC
Game
Play
/ Bold
f Stock
Period
k Period
Stock
Ordered
at Period
k AInventory
w
d 1 System
d
+
cuk Cxat
xkDA
+k
u+
+Ck
59System
2Period
4 160.5
2k 1 CD
k+1 = C
Cost
Stock
Ordered
k wPlay
Inventory
AB
AD
CD wk C0BD
AB
0at
01st
1of
1.5
1,7ACD
0.5
1.5
0µat
2Period
5C1DB
7C0CD
8C
3Timid
6Play
2
10
5
8
3
9
6
1
2Ch.
ACB
CAB
CAD
CDA
System
x
=
f
(x
,
u
w
)
u
=
(x
)1A
µ
CAB
CAC 10C1st
C
C
C
/
1st
Game
/
Bold
Play
p
pkw=
pwkpCA
Game
/
Timid
Play
1st
Game
/
Bold
Play
p/x
p
t xof
Period
k
Inventory
System
CA0Game
BC
CB
CD
k+1
k
k
k
k
k
k
k
kp
k
=SxAk + S
ukB+ kStock
wk Ordered
d
d1uAB
d1µ
3
5
2
4
6
2
3
5
2
4
6
2
System
x
=
f
(x
,
u
w
)C
)d µpkwCD
w1k AB
x
k+1Period
Finite
Horizon
2nd
Game
/w
Timid
Play
Game
Bold
Play
02+1cuFinite
0 Stock
0 010Horizon
1=
1.5
0.5
1
1
0.5
1.5
k+1
kProblems
k
k ,02nd
k2
k
k10(x
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
2
at
Period
k
+
Initial
State
AC
Problems
Ch.
1
ACB
ACD
CAB
CAD
CDA
3
5
2
4
6
r(u
)
x
x
+
u
+
k
5
7
8
3
9
6
1
2
k Horizon
k10 k+1
k
k
Demand
at +
Period
k Stock
atFinite
Period
k
Stock
at
Period
5
7
8
3
9
6
1
2
Problems
Ch.
1
A
Stage
2
w
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
ukStock
x
=
x
u
+
k
w
k
k AC
k System
CkDA
CAB
xk+1
=System
fkC(xAB
,w
= µkC(x
wk CxDB
k+1
kk + 1CAB
k
AD
CD
k , u.kC
k) u
k ) µC
kBD
k
Stochastic
Problems
at Period
InitialOrdered
State
A CDA
C
CA
CD
ABC
ACB
ACD
CAD
tk of
Period
k Stock
at AB
Period
k
Inventory
0 5kABC
18 k3)0 u90k =
0.5
1.5 0 2
10
6 1µ1k1.5
2k ) µ
. 0.5
fCD
,07
u1.5
(x
w1C
x1k2 0.5
+
1 CAC
at Period
k+1 C
Initial
State
AB
k
k, w
kSubproblem
S2(n
CCD
C
CAB
C
Do
not Repair
Repair
1 A
2 kkC
nSystem
pk+1
p=
p3
15CA
0DB
0p(x
166
B11.5
AB
BC2 Play
ADu +CkDA
CD10 n
11
1n
30xBD
2C12
410
6=
2pAB
1)
10
7S3kACD
8x0.5
91)C
2kAC
1C11
0CD
0A8
132w
0.5
1,k1.5
Ch.
1st
Game
/.(xTimid
Play
1st
Bold
pd C1CD pd
=
f51(n
(x
,u
)4
=
(x
w
5
2
System
=
fu6
,µC
uProblems
w
)1C0µCA
uk0.5
=Game
µk CD
wCk CB
xABC
5Deterministic
211kCAD
ACB
CAB
k
kk+1
k9
k
kµ21.5
k k/)0CA
kk
k
kx(x
k
SuAk xSk+1
C xk + C
Cw
C0at
C27
10 C
5 CD
7System
8CBC
3 0x9k+1
6CB
Stock
Period
k
+
1,Problems
Initial
State
A
AB
AC
Deterministic
Ch.
2kk)CDA
B
AC
CA
ACB
ACDAB
CAB
CAD State
CDA
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
kS
at Period
k
+
1
Initial
A
C
AB
AC
CA
CD
ABC
3 C
5Perfect-State
2 4CFinite
6CD2 Horizon
Problems
Ch.
1
S
C
C
C
C
C
Info
Ch.
3
A
B
AB
AC
CA
CD
BC
CB
Do
not
Repair
Repair
1k (x
2 kn) µ
1System
n p p p1n p
ACD kCAB
CAD
CDA at3 Period
Period
Stock
Ordered
System
System
xCk+1
=)C
fukProblems
(x
,at
u(x
, w)Ch.
u1kkwC
=Inventory
µx
Horizon
kPeriod
k wk11 xk12
5 Cost
2 1)4 kof
6(n
2Inventory
Period
kFinite
Stock
Ordered
.kC)µ
p2n
p2(n k1)+ 1p2(n
p(n 1)(n
pf2nn
=
(x
,9u
,06
=k0,C
µ
AB
DA Ch.
CD
BD
Finite
1C
22 at
1) State
1)n
k+1
k3
k=
k6
k, u
k 1..u
k
Stochastic
Problems
0Problems
12C
0.5
1kCCB
0.5
2
ck
A
C
AC
CA
CD
ABC
3n
5CpAB
25ACB
6CD
10
7
8
5kHorizon
2+
4w
2AD
System
x3+k+1
fAB
(x
w
)19
=1CD
µkk2
(x
)1 CµDB
wkCAB
x
S
C
CkkBC
C1.5
Do
Repair
1 System
2BC
1x
n4Finite
pStochastic
p
pProblems
k1
k
k1)
kCD 0
AC
CA
xC
+
u+
kAC Initial
wC
11S
12
1n
C
CRepair
C
C
C
10
5
8pk2(n
3k0CDA
6k1.5
1(n
1)
ACD
CAB
CAD
S+1
SPeriod
Cnot
C
CkCA
Cr(u
C
kAD
AB
DA
CD
BD
AB
A =
B
AB
CD
+
cuDB
xk+1
=
xA
upB
kC
w7
Ch.
Problems
k )C
k CB
kHorizon
k
Deterministic
Problems
Ch.
2
CB
ACD
CAB
CAD
CDA
Stochastic
10
5 7 Horizon
8 3 9Problems
6 1 2 Ch. 1
CAB CCAD
CBC
CCBCDB
CCD CProblems
BCAB
AC
CD
CCA
CBD
Finite
DA CC
CDFinite
AB
p22Problems
p2n Problems
pCh.
p2(n
Deterministic
Ch. 1)2 p(n 1)(n 1) p(n 1)n pnn
2(n
1)
Horizon
Problems
Ch.
16 2Problems
Finite
Horizon
1
Deterministic
Ch.
2
10
5
7
8
3
9
6
1
2
3
5
2
4
nd
/ Timid
Play
Game
/CBold
3 p510
2 54 Play
6 8
2pC3109 C
CABGame
Cp
C
C2nd
C1)DB
Problems
Ch.
2Ch.
Info
System
=
fkC
(x
u23k1C
, wkC) 1 uC
µk (x
w1(n
CB
ACD
CAB
CAD
6
1Perfect-State
DA p2(n
CD
BD
7
82
3CxCA
9k+1
2BD
C
C
kC
k =
k p
k x1)
k
not
Repair
Repair
1, BC
n
p11
p12k ) pµ
pAD
1n
DA
CD
Period
kAD
+ 1pC2n
Initial
State
ACDA
C
AB
AC
ABC
22
3p7Deterministic
5Perfect-State
2(nAB
41)n
2nn
1) p
(n AB
1)CA
SDeterministic
C
C
C
C
..A6
B
AC
CDAB
CB AB
CD ABC
atSAPeriod
kCD
+
165Do
Initial
State
C
ACnDB
CA
CD
Problems
Ch.
21
CSAD
CAB
Ch.
Do
notCC
Repair
Repair
1CDB
22(n
1Stock
nBC
p1)(n
pInfo
pAB
p2(n
Stochastic
DA
CD
CCA
CnCD
C
C
C
11Problems
12
1n
1(n 1)
1) . 3
B
AB CC
AC CBD
CB
CD
Perfect-State
Ch.
3 Info
Deterministic
Problems
2
... Ch.
2nd
Game
/ Timid
Play 2nd Game / Bold Play
Stochastic
Problems
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
Do
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
Stochastic
Problems
11
12
1n
1(n
2(n
1) 1Ch.
Deterministic
Problems
Finite
Problems
Ch. 1
10
5pHorizon
8
31Horizon
21 2
st
/ Timid
Play
1st
Game
/ Bold
p..1)
10
5
7Play
8
3pd1)9
6. p7
2
wProblems
Finite
Ch.
DoSGame
not Repair
Repair
1C
2CA
n Play
1CCD
n 2nd
p11
pCB
29pp2(n
4w6C
CAB
C
CpGame
C1Play
12 p1nC
1(n
2(n
1)
B
AC
BC
pd1
p52n
2nd
Game
/ Timid
/Stochastic
Bold
Stochastic
22
ACD
CAB
CAD
CDA
1)
1) pC
1) p(n 1)n pnn
CACD
C
CnDB
AB
DA
10
71)CD
33Problems
6CDA
1 6 BD
212 p22(n
ACB
CAB
CAD
Problems
Do
not
Repair
Repair
1 n(nAB
p1)(n
p12
p1n p1(n 1) p2(n 1)
11 1
Finite
Problems
Ch.
1 9CCD
p22
pC
pCD
pp(n
pAD
Repair
1 2C
n 11)C
n BD
p1)(n
pHorizon
p5
Perfect-State
Info
Ch.
3..8C
CpAD
C
Cp1(n
2n Repair
nn
11
12DB
1n
2(n
2(n
2(n
1) p
(n 1)
1)n
DA1)
AB
Stochastic
Problems
Finite
Horizon
Problems
Ch.
1
1st
Game
/
Timid
Play
1st
Game
/ Bold Play pd 1
Finite
Horizon
Problems
Ch.
1
Stochastic
Problems
p
p
p
p
p
p
p
Perfect-State
Info
Ch.
3
Perfect-State
Info
Ch.
3
22
2n
nn
Deterministic
Problems
Ch. 2 Ch.
2(n
1)
2(n 1) 1)(n1
(n1) 1)(n
1)
(n
1)n
p22 0p2n
p2(n
p(n0.5
1
Stochastic
Problems
Deterministic
Problems
Ch.
2
nn Deterministic
1
0
0
1
0
2
1) p1
2(n 1.5
1) p(n0.5
1)n p1.5
Problems
2
10
5d p7/Ch.
81 33 pw
9 6 2nd
1 2Game / Bold Play
CAD
CDACCA
C
CDB
CAB
13wTimid
..
B Cp
CDCCD
BDCBC
Play
1st
/CTimid
Play
1st
/ Bold
Play
p2nd
1 Game
pCh.
C pGame
C
d Info
Perfect-State
CB
CD
pCnn
n AB
SAGame
SBDo
C
C.pCD
C1BC
C1)(n
Perfect-State
2(n
1) AC
2(n
1) p(n
1)(n 2nd
1) p
(n
1)n
pRepair
pCh.
pC
p
p
p
AB
CAInfo
CBp 1)CCD
22CAC
2n Repair
nn p
Deterministic
Problems
2
2nd
Game
/
Timid
Play
Game
/
Bold
Play
2(n
1)
2(n
1)
(n
(n
1)n
not
1
2
n
n
p
p
p
11
12
1n
1) .
Solve
thePlay
stage
2Perfect-State
subproblems
(using
the
terminal
costs)
Finite
Horizon
Problems
Ch.
Perfect-State
Info
Finite
Horizon
not
Repair
Repair
1Play
2 nGame
1 /Game
nBold
pDeterministic
pBold
p1n
p1(n
..Ch.
Problems
Ch.
2Problems
01Problems
0 1 3 0Ch.
0 121 1.5 0.5 11(n 11) 0.52(n 1.5
0 2
Deterministic
Ch.
11Play
12
1) p2(n
1)
2nd
Game
/ Timid
2nd
Info
Ch.
3
2nd
Game
/
Timid
2nd
/
Play
1
Stochastic
Problems
Ch.
1
ystem
xk+1Play
= fk2nd
(xkGame
, uk , w
(xk Finite
) µ
wHorizon
.. Timid
k ) uPlay
k =µ
k Problems
k xk1st
Perfect-State
Info /Ch.
3 Play
Problems
Stochastic
Problems
ame
/C
Timid
1st
/ Bolddecision
Play pd 1 pd p
0each
0CRepair
1state
0 C0BD
1 /nCBold
1.5
1k pC12
1 p0.5
1.5
0Stochastic
2Game
CAB
D At
DA
CD
DB
AB
1 Game
not
Repair
1of2stage
1 2,
n0.5
pBold
p
p
.
we
the
optimal
cost-to-go
and
the
optimal
C
C
C
C
C
11 record
1nC
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
AD p
DA
CD
BD
DB
AB
1(n
1)
2(n
1)
1
Stochastic
Problems
1st
Game
/ Timid
Play
1st
Game
/
Play
p
1
p
p
1
p
p
p
p
p
p
p
w
w
d 2(n 1)
d Ch.
22
2n
nn
2(n
1)
(n 1)(n 1)
(n 1)n
Deterministic
2Problems
1st
Play 1)1stpGame
/ Bold
Play
p1)n
pProblems
p1st
p2(n/ Timid
p2(n
d 1 pDeterministic
d pStochastic
Problems
System
, uk ,1wk ) uk = µk (xk ) µk wk
Ch.
2nGame
nn
1)
1) Stochastic
1=2fk1(xkCh.
Game
/ Timid
Play (n1st1)(n
Game
/p(n
Bold
PlayProblems
pdwp 11 ppwdFinite
pwProblems
1Horizon
pw xk+1
me / Timid Play 1st Game / Bold Play pdPerfect-State
1 pDeterministic
w 1
w
d pPerfect-State
Info
Ch.
1 30.5 1 1 0.5 1.5 .0 2
Problems
2 1.5
Info
30Ch.
. x31k Ch.
Perfect-State
Info
Ch.
0
0
0
1
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
1
k+1
k
k
k
k
k
k
k
k
k
.
p0.5
pnn
1st
Game
Play pd 1 .pd pw 1 p
1) 0.5
1)
(n
1)n
Repair
211.5n1.5
1p(n0.5
p111
p112 1.5
p1n2nd
p1.5
. n 2nd
Perfect-State
Info
0epair
12(n0 001)0 11p2(n
1 0.5
0Ch.
232(n
Game
/pGame
Timid
Game
/ Bold/ Bold
Play
0p2n0
0 1p
1n 1)(n
0Repair
21(n
1)
1)
Do
not
Repair
1/3Timid
2Play
1Play
nCh.
p1st
p1(n
Stochastic
Problems
11
2nd
Game
Play
Problems
2 1) p2(n 1) .
Perfect-State
Info
3 p12 1p1nCh.
Stochastic
Problems
0 0 00/1Timid
0 Bertsekas
0Play
1.5
0.5 1.5
1/ Bold
1 20.5
1.5
0Info
2Ch. Deterministic
1Game
1 1.5
0.51 1(M.I.T.)
1 0.5
0Perfect-State
Approximate Dynamic Programming
13 / 24
1st
Game
/Timid
Timid
1st
Game
/k1(n
Play
ppxd1)
pd
2nd
Game
Play
Bold
Play
System
x
=Repair
fBold
(x
,1kRepair
ukpn
,(n
wPlay
) 1u
=n
(xpp1)n
µp11
Do
not
Repair
pk11
p2nd
pGame
2ndCGame
/AD
Timid
2nd
Game
/k2pDB
Play
x
u
Demand
Period
at
k
1n
k+1
k/n
kat
k1
knn
k Period
k 1.p1(n
1)
2(n
k
C
CpCD
C
C(n
pPlay
pC
pµ
Do
not
2(n
nk/)Bold
pStock
ppw
ABS C
BD
AB
22p2n
2n
12
1n
1)
1)
1)(n
1)
(n
pDA
pRepair
p1w
pkC
p12
SB / C
C
C1)
C
C
C
22
nn
2(n
2(n
1)
1)(n
1) CAD
1)n
A
AB
AC
CA
CD
BC
CB
2nd
Game
Timid
Play
2nd
Game
/2(n
Bold
Play
ACB
ACD
CAB
CAB
CC
C2(n
CC
C
CDA
CBD
CDB
C
CCD
C
2nd
Game
Timid
Play
2nd
Game
Bold
Play
AD
AB
AB
AD
CD
BD C/
DB
AB
kpC/CD
+
1C1)DA
pat
p2n
pDA
p(nCDA
pnn ..
wk xk uk Demandp at Period
k Stock
k2(n
Stock
atCPeriod
22 Period
C
C
C
C
2(n
1) p(n
1)(n 1)
1)n
AB
AD
CD
BD
DB
AB
.
p
p
p
p
p
p
Scheduling
Example
Algorithm
II
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
22
2n
nn
2nd
Game
/System
Timid
2nd
Game
/C
Bold
Play 1) .
2(n
1)
(n
(n
Cost of Period k Stock Ordered
at02(n
Period
k0not
Inventory
Do not
11.5
2 C1)(n
nDA
11)C1nCD
p1Play
pnBD
11
12
1n
1(n
1)
0 Repair
1 1)
0Repair
1Repair
0.5
1 21)n
0.5
1.5
C1st
Do
1p1.5
pp0p
pp212
p1p
k+1
AB
DB
0Game
02Bold
1p/ Timid
0Play
0C
0.5
1AB
12(n
0.5
1.5
211) .. p
11
..1(nPlay
1) p0
2(n
1st
Play
/1n
Bold
pdpnn
3C
5AD
6Repair
pC2n
pGame
p22
p/1n
pw
Game
/pTimid
Play
1st
/BD
p12(n
ppw
22CAD
nnPlay
dGame
dGame
2(n
1)C
1)
(n
1)(n
1)
(n
1)n
2nd
Game
/42(n
Timid
Play
2nd
/
Bold
pTimid
p
p
p
p
p
2nd
/22Game
Play
2nd
Game
Bold
Play
+ cuk xk+1 = xk + u + k1st w
CGame
C
C
C
C
2n
.
2(n
1)
1)
(n
1)(n
1)
(n
1)n
k not
ABRepair
DA
CD
DB
AB
Do
Repair
1
n
1
n
p
p
p
p
p
.
1st
/ Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
p
11
12 C1n
S/Timid
SBPlay
CAB
C//CA
Cp
C. CB
CCD
w
wBC
1(n
1) C
2(n
d
AC
CD 1)at
1st
/Repair
1st
Game
Play
1pPeriod
pkw
Cost
kPeriod
Ordered
at1)
2ndGame
Game
Play
2nd
Game
Bold
Play
wkGame
xTimid
Demand
kp1(n
Stock
Period
Do/not
Repair
Repair
1A
nBold
1of
nPeriod
p111
pd12
pStock
. pkd1)Stock
k Repair
Do
not
2at
n
11n/Bold
nBold
p11
p12p2(n
pd1n
p2(n1In
1)
1(n
2nd
Timid
Play
/u
Play
1st
Game
/k2Timid
wk xk uk Demand at Period
kp2n
Stock
at Period
k 2nd
Stock
at
Period
d k1pxkpd pw p 1
f+
,Play
u=
w1st
)(x
up,k+
µ
)Play
w
Do
2k(x
nkxGame
1=
pk=
k)(x
k
kp
k
kµpp1n
p22 Game
p2(n
p=2(n
11
12
nn
1(n
1)
2(n
1)
1)
1)
(n
)1)(n
cuu
xpkk+1
=
u,n
+
kk)(x
w
pkfnot
pRepair
p1,=
pw
pµnn
1xpRepair
System
xk+1
,0pr(u
u(n
,0=
µ
)
µ
w
x
Cost of Period k Stock
Ordered
atpSystem
kk+1
Inventory
x
u
u
(x
)
µ
w
xk
kw
kk
22Period
2n
2(n
1)
2(n
1)System
(nkf1)n
1)(n
1)
(n
1)n
k+(x
k
kSystem
k
k
k
k+1
k
k
k
k
k
k
k
k
k
.1.5
0621.5
10
5
81)(n
900.5
11 /Game
21.5
2nd
/2(n
Timid
Play
2nd
Bold
Play
01 pC02(n
1 Period
1.5
1Timid
113ABC
0.5
0Play
22/2nd
wk at
xkPeriod
uk Demand
at0Period
kp02n
at01st
k0.5
Stock
at
/ Timid
Play
Game
Bold
Play
pd1pC1n
1/0.5
p2
Do
not
Repair
Repair
1pGame
n
n
p0.5
p112
tock
k + 1 Initial
State
AStock
AB
CA
CD
w1)
w 11)
d1
11
Game
/17(n
Play
1st
Bold
Play
pdp1(n
pd10pp2(n
/1st
Timid
Bold
Play
p022
pGame
p0.5
0 k1Do
0Game
1AC
1.5
1Game
1Period
0p1/nn
w
C
C
C1.5
C
CGame
C..pAB
1)1st
1) 2nd
1)
(n
1)n
AB
AD
DA
CD
BD
DB
r(u
w
not
Repair
Repair
2
n
1
n
p
p
p
p
p
k ) + cuk xk+1 = xk + u +
k
11
12
1n
1(n
1)
2(n
1)
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
pw
p22
p2n pPlay
pBold
pd0.5
0 w p01)
01st
1p02(n
01)pDemand
002(n
11)1)(n
1.5
0.5
1k1)n
1p1nn
0.5
1.5
22 d wat Perio
d 0 pnn
p
p
p
p
2(n
(n
1)
(n
22
2n
1
2(n
1)
(n
1)(n
1)
(n
1)n
1st Game
/ Timid
Game
/
Play
p
1
p
p
1
p
x
u
at
Period
Stock
at
Period
k
Stock
w
w
d
1
0
1
1.5
1
0.5
1.5
0
k 2 k4 6
k 2
3p222
5Play
p2n 2nd
p/42(n
pStock
p(nState
pnnCk AB
2nd Game
/ Timid
Game
Bold
Stock
at
Period
+
1Bold
Initial
A
AC
2nd
Game
/pTimid
Play
2nd
Game
/ Ordered
Cost
of
Period
k2 Play
at1)n
Period
Inventory
3
5
2
6
2(n
1)
1)
(n k 1)(n
1) Play
3
5
2
4
6
kp
1kp
0+(x
0k(x
1kkInventory
0k,System
1up=
1.5
0.5
1k (x
1w
0.5
1.5
011.5
Cost
Period CAB
k Stock
Ordered
atxk+1
Period
System
ACBof ACD
CAD
CDA
x=
=x)pABC
f)0.5
,wu
,xw
)pdu1)n
1st
Game
Timid
1st
Game
/+
Bold
Play
p0dk (xpPlay
System
xk+1
,2n
u0r(u
)Play
u
µ
(x
µ
p2µ
k
kGame
k
kw
Timid
2nd
Game
/k+1
Play
wk
k
k
k
k
kk
System
=
,,p
w
)p0kkCA
µ
(x
µ
xkwkp0.5
22
0fC
1ukw2(n
00kcu
11)(n
1.5
1nn
2) 1µkppw
1st
/Bold
Timid
Play
/=Bold
1 xpk
1)
(n
1)(n
k/Play
k)
k
k
kk+
k
+
x2(n
=1)
u
Stock
at Period
k+
1 2nd
Initial
State
AB
AC
CD
pGame
p2n/=
pfA
pat
p1)
k+1
k(n
222nd
2(n
(n
(n
1)n
01)
02(n
1Play
01)Game
0/1not
1Repair
1.5
11Play
11st
0µnn
Repair
20.5
n (x1.5
1 )n
p 2 pk12 xpk1nd p1(n 1
/System
Timid
2nd
/kkkpuBold
Game
Timid
Game
Play
wCost
xkk of
u
Demand
at u
Period
Stock
k1 Stock
Period
x1)
=
fSystem
(x
,Play
u
,0.5
Period
Stock
k2nd
Inventory
kcu
+
xkk+1
= xk k+
+ k 0k Ordered
w
0atGame
0Period
1.5
0.5
1Do
0.5
1.5
0w
k+1
kGame
kw)k2
k
k0 1 at Period
System
xk+1
=
fkk(x
,2nd
)u
uk=
=µ/µkBold
µkk11 w
wat
xk
k ,k
k
k (xk
k ) Stock
k Period
w
u
Demand
at
Period
Stock
at
Period
k
Finite
Horizon
Problems
Ch.
1
k x
k
k
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
k
ACB
ACD
CAB
CAD
CDA
Game
1st
pPlay
pw
10
5Stock
71st
81
32 Play
2
59k/ Bold
76Play
81 Play
3Game
9 Game
11 at
2/Play
d 11 ppw
d pw 1
2nd
Game
/10
Timid
2nd
Game
/ Timid
Play
Game
p6/ Bold
pBold
ukk Demand
Period
k w
Stock
at
Period
k
at
Period
k cu
d µ pw
)x+
xCk+1 = C
xat
uC+ k1st
5C
7 2System
81st
3
9
6/3Timid
k +
k10
Cost
of
Period
Ordered
Period
=1.5
(xPlay
, u6kStock
,12nd
w
upk0.5
=1dµ/k1.5
(x
SB
C
CCB
+
k+1
k4
k ) 1Game
k )p0
k12wkpkwxkInventory System
A
AB
AC
CD
CD
/1BC
Play
1st
Bold
Play
pState
5f0=k=
2at
3CAD
54k20
4CDA
60 2nd
0 0xC
0.5
3CA51st
2 Game
6Game
2Timid
w
dInitial
dµ
Game
/x1Stock
Timid
Bold
Period
k,,+
1kPlay
C0.5
ACB
ACD
CAB
fp1/xGame
(x
=
(x
)Play
w
xAC0 pCA
0Game
0kpku,Play
02u
1.5
1Play
1µpµ
2 C
2nd
/1Timid
Play
2nd
Game
/,/u
Bold
k+1
k
kx
System
x
fk2n
(x
w11st
))Play
uuk =
p
pw
pAB
1st
Game
/System
Play
1st
Bold
p0.5
1k(x
pk)dA
p1.5
k+1
k
kµ
k
k1)(n
k1)
k
k
1st
Game
/=
Timid
Game
/pk(n
Bold
22
r(u
)Timid
+
cu
x
+
k1)w
w
wk w1
w kpdpnn
dµ
2(n
2(n
1)n
w 1
d(n 1
k3
k5
k+1
kk+
k kx1)
Stock
at
Period
k
+
1
Initial
State
A
C
AB
AC
CA
CD
ABC
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
5
2
4
6
2
k
k
k+1
k
k
k
k
k
k
k
k
k
3
2
4
6
2
Cost of Period k Stock Ordered at Period k Inventory
Deterministic
Problems
Ch.
SB 0.51st
C1.5
C1.5
01
01.5
1 System
0 /0Timid
1SA
1 Game
0.5
2p C
AB
AC
CA 0 CCD
BCp CpCB 1 C
Stock
1 Initial
State
C0 3AB
AC
CA
ABC
Game
Play
0 C
1k AInventory
0Cost
0.5
11.5
0.5
012C
2/ Bold
Period
k Period
Stock
Ordered
at0Period
System
w
d 1 System
d
cuk Cxat
xkDA
+k
u+
+Ck
51st
2Period
4 160.5
2k 11CD
Cof+
k+1 = C
Stock
Ordered
k wPlay
Inventory
AB
AD
CD wk C0BD
AB
01st
1of
1.5
1,7ACD
0.5
1.5
0µat
2Period
10 07
51DB
8C
3Timid
9BC
6
2
10
5
8
3
9
6
1
2
ACB
CAB
CAD
CDA
System
=
f
(x
,
u
w
)
u
=
(x
)
µ
x
CAB
C
C
C
C
10
87C003CD
6
1
2
/9
Play
1st
Game
/
Bold
Play
p
1
p
p
1
p
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
pw
t xof
at
Period
kx
Inventory
System
AC 5 C1st
CA0Game
CB
CD
k+1
k
k
k
k
k
k
k
k
k
k
=SxAk + S
ukB+ kStock
wk Ordered
w
w
d
d
d
d
3
5
2
4
6
2
3
5
2
4
6
2
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µpkwCD
w1k AB
x
k+1Period
Finite
Horizon
Problems
Ch.
1
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
k+1
k 0.5
k
k A1k C
k 1.5AC
k0 kCA
k
0
0
1
0
0
1
1.5
1
0.5
2
Stock
at
Period
k
+
Initial
State
AB
Finite
Horizon
Problems
Ch.
1
ACB
ACD
CAB
CAD
CDA
3
5
2
4
6
2
r(u
)
+
cu
x
=
x
+
u
+
k
w
10
5
7
8
3
9
6
1
2
k
k
k+1
k
k
Demand
Period
Stock
Period
at
10 Problems
5 xPeriod
7
8= Ch.
9k1, u
6 ,w
1 )at
2 uC
w
xk Horizon
ukk Stock
Demand
at
Period
Stock
Period
k(x
StockkBD
atwkPeriod
ukStock
xk+1
=ACD
xatkk+
u
+
kk State
wk AatFinite
k AB
CxDB
CAB
System
f3kC(xAB
=AµStage
AD
CD
k+1
k .kC
k
kDA
kC
k ) 1 µC
k
Stochastic
Problems
at Period
1CAB
Initial
C
AC
CA
CD
ABC
ACB
CAD
CDA
tk of
Period
k + Stock
Ordered
at
Period
k
Inventory
System
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
System
xxk+1
=
fCD
(x
,7uukk,8,wwk3k) ) u9u
(x
10
5kkABC
6=µ1kµ(x
k(x
k=
k2
k )µkµkwkwx
k k xk
.
System
=
f
,
)
k
+
1
at
Period
k
+
1
Initial
State
A
C
AB
AC
CA
k+1
k
k
k
C
CCD
C
CAB
CRepair
C12
Do
not Repair
1 2wkC
n CD10 nC
p11
p3
15 p02
0DB
0p10
1.5
166
0.5
0µ
2w
Subproblem
B11.5
AB
CA
BC2 Play
ADu +CkDA
1n
30BD
410
6=
2pAB
1)
7S3kACD
8x0.5
91)C
2kAC
1C11
0CD
0A8
132w
0.5
1,1.5
Ch.
1st
Game
/.(xTimid
Play
1st
Bold
pd C1CD pd
=
f51(n
(x
,u
,S2(n
)4
=
(x
5
2
System
=
fu6
,µC
uProblems
w
)1C
uk0.5
=Game
µk CD
wCk CB
xABC
5Deterministic
211kCAD
ACB
CAB
k
kk+1
k9
k
kµ21.5
k k/)0CA
kk
k
kx(x
k
SuAk xSk+1
C xk + C
CCA
C0at
C27
10 C
5 CD
7System
8CBC
3 x9k+1
6CB
Stock
Period
k
+
Initial
State
A
AB
AC
Deterministic
Ch.
2kk)CDA
B
AC
ACB
ACDAB
CAD State
CDA
Deterministic
2
Finite
11Problems
ckS
at Period
k CAB
+ 1 Initial
A
CHorizon
AB
AC
CA
CD
ABC
3Problems
5Problems
2 Ch.
4CFinite
6Ch.
2Ch.
Horizon
Problems
Ch.
1
S
C
C
C
C
C
C
Perfect-State
Info
Ch.
3
Finite
Horizon
Problems
1
A
B
AB
AC
CA
CD
BC
CB
CD
Do
not
Repair
Repair
1
2
n
1
n
p
p
p1n p
ACD
CAB
CAD
CDA
Period k Stock Ordered 3
at Period
System
System
xCk+1
=)CfukProblems
(x=kC
,atu(x
, w)Ch.
u1kkwC
=Inventory
µxk (xCk ) µSystem
22 44 kof
66xPeriod
22Inventory
Horizon
kPeriod
k C wk11 xk12
5 Cost
kFinite
Stock
Ordered
.kC)µ
p2n
p2(n k1)+ 1p2(n
p(n3 51)(n
pf2nn
=
(x
,06
AB
DA
CD
BD
AB
Finite
1C
22 at
1) State
(n
1)n
k+1
k3
k=
k6
k, u
k0C)Ch.
k 1..u
k
k0.5
k)
Stochastic
0Problems
12C
0,µProblems
1.5
1 µDB
1kCCB
0.5
1.5
2
ck
A
C
AC
CA
CD
ABC
31)
5CpAB
25ACB
6CD
10
7
8
1
5kHorizon
2+
4w
2AD
System
x3+k+1
f
(x
w
=
µ
(x
w
x
S
C
C
C
Do
Repair
1)System
2BC
n
1CB
n4Finite
pStochastic
p
p,B9u
p
p
k
k
k
k
k
k
k
k
k
A
AB
AC
CA
CD
BC
CD 0
xC
+
u+
kAC Initial
wC
11S
12
1n
C
CRepair
C
C
C
10
5
7
8
3
9
6
1
2
1(n
1)
2(n
1)
ACD
CAB
CAD
CDA
S+1
SPeriod
Cnot
C
CkCA
Cr(u
C
C
kAD
AB
DA
CD
BD
DB
AB
A =
B
AB
CD
+
cu
x
=
x
u
k
w
Horizon
Problems
Ch.
1
k
k
k+1
k
Deterministic
Problems
Ch.
CB
ACD
CAB
CAD
CDA
Stochastic
Problems
10
5k 7 Horizon
8 Problems
32 9Problems
6 1 2 Ch. 1
CAB
C
CCBCDB
CCD C
BCAB
AC
CD
BC
CCAD
CCA
CBD
Finite
DA CC
CD
AB
p22Problems
p2n Problems
pCh.
Ch. 1)2 p(n 1)(n 1) p(n 1)n pnn
Deterministic
Ch.
2(n
Finite
Problems
Ch.
Finite
Horizon
1 2 p2(n
Ch.1)
10
7Bold
923 66
121
2
5Deterministic
4Deterministic
10
535Horizon
88Problems
3
2216 2Problems
nd
Game
/ Timid
Play
2nd
Game
/7C
Play
5
2
4
6
C
C
C
C
C
C
Deterministic
Problems
Ch.
2Ch.
Perfect-State
Info
System
=
fkC
(x
u23k1C
,w
) 1 uC
µk (x
w1(n
CB
ACD
CAB
CAD
CDA
10
5
7
8
3
9
6
1
2
.. 6
AB
AD
DA
CD
BD
DB
AB
10
5Do
7
8CState
3CxCA
9k+1
1
2BD
C
C
kC
k
k =
k p
k x1)
k
not
Repair
Repair
1, BC
nDB
n
p11
p12k ) pµ
p2(nState
p(n 1)(n
pBC
pAD
1n
DA
CD
Period pk22
+ 1p2n
Initial
A C1) Stock
AB
AC
CD
ABC
nn
3
5
2
4
6
2
1) p2(n
1)CA
(nAB
1)n
S
S
C
C
C
CCA
C
A
AB
AC
CD
CB AB
CD ABC
at
Period
k
+
1
Initial
A
C
AB
AC
CD
Deterministic
Problems
Ch.
2
CSAD
CAB
Perfect-State
Info
Do
notCC
Repair
Repair
1CDB
2 CnCD
1 C
nBC
p11Problems
p12
pInfo
pCh.
p2(n Ch.
Stochastic
DA
CD
CCA
C
CCD
1n
1(n 1)
1) . 3
B
AB CC
AC CBD
CB
.. Ch.
Perfect-State
3
Deterministic
Problems
2
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
Stochastic
Problems
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
Stochastic
Problems
Do
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
.
Stochastic
Problems
11
12
1n
1(n
1)
2(n
1)
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
10
5pHorizon
8
31 29p4w66 Ch.
1 21
st
/ Timid
Play
1st
Game
/ Bold
p..2
10
5
7Play
8
3pd1)9
6. p7pd1
wProblems
Finite
DoSGame
not Repair
Repair
1C
2CA
n Play
1CCD
n 2nd
p11
pCB
CAB
C
CpGame
C1Play
12 p1nC
1(n
2(n
1)
B
AC
BC
p52n
p2(n
2nd
Game
/ Timid
/Stochastic
Bold
Stochastic
22
ACD
CAB
CAD
CDA
1)
1) pC
1) p(n 1)n pnn
CACD
C
CnDB
AB
DA
10
71)CD
33Problems
6CDA
1CBD
212 p22(n
ACB
CAB
CAD
Problems
Do
not
Repair
Repair
1 n(nAB
p1)(n
11 p12 p1n p1(n 1) p2(n 1)
Finite
Problems
Ch.
119CCD
p22
pC
pCD
ppFinite
pAD
Repair
1 2C
n 11)C
n BD
p1)(n
pHorizon
p5
Perfect-State
Info
Ch.
3..8C
CpAD
C
Cp1(n
2n Repair
nn
11
12DB
1n
2(n
2(n
2(n
(n
1) p
(n 1)
1)n
B
DA1)
AB
Horizon
Problems
Ch.
Stochastic
Problems
Finite
Horizon
Problems
Ch.
1
1st
Game
/3Timid
Finite
Horizon
Problems
Ch. 13Play 1st 1Game / Bold Play pd 1
Stochastic
Problems
p22
p
p
p
p
p
p
Perfect-State
Info
Ch.
Perfect-State
Info
Ch.
2n
nn
Deterministic
Problems
Ch.
2
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
Perfect-State
Info
Ch.
3
p22
p
p
p
p
p
p
Stochastic
Problems
Deterministic
Problems
2 / Bold Play
nn Deterministic
0C2n
1 2(n
0CDA
01) 12(n
1.5
1.5
0 2 2nd
1)
(n0.5
1)(n1 1) 1 (n0.5
1)n
10
5d p7/Ch.
8Ch.
3 p2w
9 61 2nd
1Ch.
2Game
C
CDB
CAB
13wTimid
..
B Cp
AD
CDCCD
BDCBC
Game
Play
1st
Play
1st
/ Bold
Play
pd Info
1Problems
pCh.
1 3C
C pGame
C/pCTimid
CCB
C
Perfect-State
CD
n AB
SAGame
SBDo
C
C1)(n
Perfect-State
2(n
1) AC
1) CA
(n
1)(n 2nd
1) p
(n
1)n
pRepair
pCh.
pC
pn
AB
CAInfo
CB 1)CCD
22CAC
2n Repair
nn p
Deterministic
Problems
2Ch.
2nd
Game
/2(n
Timid
Play
Game
/pnn
Bold
Play
2(n
1)
(n p
(n 1)n
not
1C..pCD
22(nn31)
1BC
pp1n
p1(n p1)
11 p12
1) .
Finite
Horizon
Problems
1Problems
Perfect-State
Info
Ch.
Finite
Horizon
Ch.
1
Deterministic
Problems
Ch.
2
Solve
the
stage
1
subproblems
(using
the
solution
of
stage
2
subproblems)
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
Deterministic
Problems
Ch.
2
0
0
1
0
0
1
1.5
0.5
1 1 0.52(n 1.5
0 2
Deterministic
Problems
Ch.
2
11
12
1n
1(n
1)
2(n
1)
2nd
Game
/ Timid
Play Play
2nd Game
/Game
Bold Play
Perfect-State
InfoPlay
Ch.Problems
3
2nd
Game
/
Timid
2nd
/
Bold
1
Stochastic
Horizon
Problems
Ch.
1
ystem
x
= fk2nd
(x , uk , w
) µ
.. Timid
k ) uPlay
k = µk (xk Finite
k w
k xk1st
Perfect-State
Info /Ch.
3 Play
Problems
Stochastic
Problems
ame
/C
Timid
1st1 Game / Bold Play pd 1 pd pw
0 k+10Play
1 0 CkGame
0BD
1 /nCBold
1.5
0.5
1 pC121 p0.5
1.5
0Stochastic
2Game
CRepair
CAB
D Repair
DA
CD
DB
AB
not
1
2
1
n
p
p
p
.
C
C
C
C
C
C
11
1n
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
AD
CD
BD
AB the optimal
1)
At each
state
of stage
1,
record
cost-to-go
decision
Stochastic
Problems
1st Game
/ Timid
Play
1st Game
/we
Bold
pthe
12(noptimal
p1)
p2(n
ppPlay
p1(n
p1w2(n211)
p(nDB1)(n and
p1nn
dDA
d Ch.
22 p2n
1) pw
1) p(n
Deterministic
1st
Play 1)1stpGame
/ Bold
Play
pProblems
p1st
p2(n/ Timid
p2(n
d 1 pDeterministic
d pStochastic
Problems
System
,1)n
uk ,1wk ) uk = µk (xk ) µk wk
Problems
Ch.
2nGame
nn
Stochastic
Problems
1)
1)(n
1) Stochastic
1)n
1=2fk1(xkCh.
Game
/ Timid
Play (n1st
Game
/p(n
Bold
PlayProblems
pdwp 11 ppwdFinite
pwProblems
1Horizon
pw xk+1
me / Timid Play 1st Game / Bold Play pdPerfect-State
1 pDeterministic
w 1
w
d pPerfect-State
Info
Ch.
3
1
Problems
Ch.
2
Info
Ch.
3
.
Perfect-State
Info
Ch.
3
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5 0 2
System xk+1 = fpk (xk , uk , wk )p uk = µkp(x
k ) µk wk1 . xk
nn
1st
Timid
/ Bold
Playp pd 1 ...pd pw 1 pw
1) 0.5
1)
(n
1)n
Repair
211.5n1.5
1 (n0.5
p111
p112 1.5
p1n2nd
p1.5
. n 2nd
Perfect-State
Info
0epair
12(n0 001)0 11p2(n
1 0.5
0.5
0Ch.
232(n
Game
/pGame
Timid
Game
/12Bold
Play
0p2n0
0 1p
1n 1)(n
0Repair
21(n
1)
1)
Do
not
Repair
1/Deterministic
2Play
1Play
nCh.
p1st
pGame
p1n
p1(n
Stochastic
Problems
11
1)
2(n 1)
1
Game
/
Timid
Play
2nd
Game
/
Bold
Play
Problems
Ch.
2
Perfect-State
Info
Ch.
3
Perfect-State
Info
3
Problems
Perfect-State
Ch.
0 0 0.51 11.51 0.5
0.5 1.5
1 01 20.5 Info
1.5Stochastic
0 32
10 0 00 1 1 1.5
(M.I.T.)
ApproximateProblems
Dynamic
Programming
inite HorizonBertsekas
Problems
Ch. 1
Stochastic
System
x
= f (x , u , w ) u = µ (x ) µ w 14
x / 24
SB / C
CAD
C
C
C
C
22
2n
nn
1)
2(n
1)
(nBC
1)(nC
1) CAD
(n
1)n C
AB
AC2(n
CA
CD
CB
2ndSAGame
Timid
Play
2nd
Game
/ACD
Bold
Play
ACB
CAB
CAB
CkpC
CC
C
C
C/CD
CBD
C
C
CCD
2nd
Game
Timid
Play
2nd
Game
Bold
DA
DB
AB
AB
AD
CD
BD C/
DB
AB nn .
+
1C1)
pat
p2n
pDA
p(n
p(nCDA
wk xk uk Demand
atpPeriod
Stock
Period
k2(n
Stock
at
p22 p2n
p2(n
pC
pDA
22
CPeriod
CPlay
2(n
1)pnn
1)(n 1)
1)n
2(n
1)
(n
1)(n
1)
1)n
AD
CD
DB
ABp
.
pDo
p2n Repair
p kPeriod
pGame
p1.5
pC
0p01)C
0C
11C
00.5
0.5
1pBD
1p21(n
0.5
1.5
0 1) 2..
22 not
nn
2nd
/System
Timid
2nd
Game
/C
Bold
2(n
1)
(n
(n
1)n
Cost of Period k Stock Ordered
k0AB
Inventory
Repair
20C1)(n
nDA
11)11(nCnCD
p1Play
pnBD
pPlay
11
12
1n
1)
2(n
0at2(n
0 11)
1Repair
1.5
1
0.5
1.5
0
C
C
C
Do
not
Repair
2
1
n
p
p
p
k+1
AB
AD
DB
AB
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5 p02(n
211) .. p
11
12
1n
..1(nPlay
Scheduling
Example
Algorithm
III
1st
Game
/
Timid
Play
1st
Game
/
Bold
pdpnn
3
5
2
4
6
2
p
p
p
p
p
p
p
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
ppw
22
2n
nn
w
d
d
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
p
p
p
p
p
p(n 1)1)n
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
+ cuk xk+1 = xk + u + k1st w
CGame
C
CDA
C1st
C12n
C
22
.. CB
2(n
1)
2(n
1)
(n
1)(n
1)pC
k not
ABRepair
CD
BD
DB
AB
Do
Repair
1
2
n
n
p
p
p
p
p
.
1st
/ AD
Timid
Play
Game
/
Bold
Play
p
1
p
p
1
11
12 C
1n
S
S
C
C
C
C
C
C
w
w
1(n
1)
2(n
1)
d
d
A
B
AB
AC
CA
CD
BC
CD
1st
/Repair
Play
1st
Game
Play
ppd1n
1pPeriod
pkwp2(n1Ina
kPeriod
Ordered
at1)
2ndGame
Game
/Timid
Timid
Play
2nd
Game
/ Bold
Bold
Play
wkGame
Demand
kp1(n
Stock
Period
Do/not
Repair
Repair
nBold
1of
nPeriod
p111
p12
pStock
. pkd1)Stock
k2Cost
k Repair
Do
not
2at
n
1/1n
nBold
p11
p12pat
Game
/k Timid
Play
2nd
/1x=
Play
1)
2(n
1(n
2nd
Timid
Play
2nd
/u
Play
1st
Game
/Bold
Timid
wk xk uk Demand2nd
at Period
Stock
at Period
kGame
Stock
at
Period
d k1pxkpd pw p 1
f+
(x
,Play
uµ
w1st
)(x
up,k+
)Play
w
Do
2k=
nµxGame
1w=
n/µ
pk=
k+1
ku
k
k
kµpp1n
p22 Game
p2n
p2(n
p=
11
12
nn
1(n
2(n
1)
(n
1)(n
(n
cu
xpkkk+1
+
pSystem
pknot
pRepair
p1,=
pµnn
1xpRepair
System
x1)
f2n
, pur(u
, 1)
w
(x
)k1)(n
xkkkpk)(x
Cost of Period k Stock
Ordered
at
System
x
u
ukw
µk0 1)
wk2 2(n
x
k)
k=
22Period
2(n
2(n
1)System
(nkf1)n
(n
1)n
k+1
k+(x
kk0
kInventory
k )1
k1)
k,pw
k+1
k 1)
ku
k (xk.)
0
0
0
1
1.5
0.5
1
1
0.5
1.5
10
5
7
8
3
9
6
1
2
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
wk at
xkPeriod
uk Demand
at0Period
k
Stock
at
Period
k
Stock
at
Period
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
1
pw 11)k
Do
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
Stock
k + 1 Initial
State
A
C
AB
AC
CA
CD
ABC
w
d
d
11
12
1n
Game
/1(nTimid
Play
Game
Bold
Play
pd 1(n
1C..AB
pd p2(n
/npTimid
Play
/DB
Bold
Play
1)
p02nnot
p12(n
p0.5
0 kDo
1 kp2(n
0 01)1st
1.51) 2nd
1Game
0.5
0pnn
2/2nd
w
22
C2Timid
C1.5
CpCD
CGame
1)(n
1)
(n
AB
AD
DA
BD
r(u ) + cuk xk+1 = xk + up+
w
Repair
Repair
np 1C
1Play
p1st
pGame
pnn
pC2(n
111)n
12
1n
1(n
1)
1)
1st
Game
/
1st
/
Bold
Play
p
1
p
p
1
p
w
p
p
p
p
p
p
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
d
d
p
p
p
p
p
p
p
22
2n
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
22
2n
nn
+1 k
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
1st Game
/ Timid
1pkd1pStock
pw 0k Stock
1st Game
/ Timid
Play 3Play
1st
pPeriod
pd w1pw10.5
w5kGame
xk40Game
u1/k Bold
at1.5
at1 p
Period
at Periow
w1.5
021st
0/ Bold
0Play
1Play
2
d p1d0.5
2Demand
p2n
pStock
p(nState
pnnC
2nd Game
2nd
Bold
Stock
at
Period
+
1Bold
Initial
A
AC
Game
/p6Timid
Play
Game
/ Ordered
Cost
of
Period
k2 Play
at1)n
Period
k AB
Inventory
3Game
51)
2 p/42(n
62nd
2(n
1)
(n k 1)(n
1) Play
3 at5 /Period
2Timid
40+2nd
61p022
2Play
k
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
Cost
Period CAB
k Stock
Ordered
k
Inventory
System
ACBof ACD
CAD
CDA
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
x
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
p
1
p
p
1
p
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
x
pPlay
p0.5
pnn
kµ
kGame
k1.5
k kw
kp wk
/ Timid
2nd
Game
/k+1
d 1)n
d
k+1
k
k
k
k
kxp
k
System
=
(x
,p
,AC
w0kcu
)p0kkCA
upkx2(n
=
µ
(x
)0.5
wkkkkp1)
220fCk
2n
0r(u
1uk2(n
11)(n
1.5
1nnxkw
2
1st
/Bold
Timid
Play
/ Bold
1 pk
1)
(n
1)(n
k+1
kk
k)
k
kPlay
kk+
k
+
=1)
+
u
Stock
at Period
k + 1k 2nd
Initial
State
AB
CD
pGame
px2n
ppA
pat
k+1
kp
k(n
222nd
2(n
(n
(n
1)n
01)
02(n
1Play
01)Game
0/1not
1Repair
1.5
11Play
11st
0µp0k112wPlay
Do
Repair
20.5
n (x1.5
1 )n
pk12 xpk1nd p1(n 1
Game
/System
Timid
2nd
/ABC
Bold
2nd
Game
Timid
Game
Play
wCost
xkk of
u
Demand
at u
Stock
k1 Stock
Period
x1)
=
fSystem
(x
,Play
u
,0.5
Period
Stock
Ordered
Inventory
kcu
)+
xkk+1
= xk k+
+0k 01
w
0at
0Period
1.5
0.5
1.5
0w
k+1
kGame
ku
kw)k2
k
k00 10at Period
System
=
fkk(x
)u
uk=
=µ/µkBold
µk wat
xk
0Period
1.5
10.5
11xk+1
0.5
1.5
2,2nd
k0,k
k
k (xk
k Period
w
ukk5
Demand
atPlay
Period
Stock
Period
kkp)CDA
Stock
Finite
Horizon
Problems
Ch.
1at
k x
k 0.5
x1k xuk uDemand
at Period
k Stock
atStock
Periodat
k/1Timid
Stock
at
Period
ACB
ACD
CAB
CAD
1st
Game
/
Timid
1st
Game
/
Bold
Play
1
p
p
1
pw
10
7
8
3
9
6
1
2
10
5
7
8
3
9
6
1
2
w
d
d
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
1st
Game
Play
1st
Game
/
Bold
Play
p
1
p
p
1
p
Demand
at
Period
k
Period
k
Stock
at
Period
w
w
k
k
d
d
)
+
cu
x
=
x
+
u
+
k
w
72System
8CB
3 9 k+1
6 3Period
1 5f2k2(x/k4kBold
k
k k10
Cost
of
Ordered
Period
=
, u6kStock
,12nd
w
upk0.5
=1µ/k1.5
(x
µk12wkpkwxkInventory System
S
SBk Ck+1
CkAC C3CA51st
C
C
C
+
k ) 1Game
k )p0
A
AB
CD
BC
CD
/15
Play
1st
Play
pat
54 20
4CDA
60 2nd
0 0xC
1.5
0.5
2 Game
6Game
2Timid
w
dInitial
dµ
Game
/x1Stock
Timid
Bold
at
Period
k,,+
1kPlay
State
C0.5
AB
CA
ACB
ACD
CAB 3CAD
System
fp1Play
(x
w
=
(x
)Play
w
xAC
0Game
0=
0kpku,Play
02u
1.5
0.5
1Play
1µpµ
1.5
2 1C
2nd
/1Timid
Play
2nd
Game
/,/u
Bold
k+1
k
k(x
k
kw
kx
kp 0
System
x
=
fk2n
(x
w11st
))Play
uukkx1)
=
µ
)dA
p
p
p
p
1st
Game
/
Timid
Play
1st
Game
Bold
p
1
p
1
p
k+1
k
kµ
k
k(n
k1)(n
k1)
k
k
1st
Game
/
Timid
Game
/
Bold
p
1
p
22
nn
r(u
)
+
cu
x
=
x
+
+
k
w
w
w
d
2(n
1)
2(n
(n
1)n
w
d
d
k
k
k+1
k
k
Stock
at
Period
k
+
1
Initial
State
A
C
AB
AC
CA
CD
ABC
x
u
Demand
at
Period
k
Stock
at
Period
k
Stock
at
Period
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
w
25uk4k2 =64 kµ26k (x2kk) kµk wk k xk k k
k ofk Period k Stock
ku 3,kw53k
Cost
Ordered
at Period
k k ,Inventory
System
xk+1
=k+1
f (x
k )1 System
Deterministic
Problems
Ch.
SB 0.51st
C1.5
C1.5
0k1
01.5
0 /0Timid
1SA
1 Game
0.5
2p C
AB
AC
CA 0 CCD
BCp CpCB 1 C
Stock
1 Initial
State
C0
AC
CA
ABC
Game
Play
0 C
1k AkInventory
0Cost
0.5
11.5
0.5
012C
2/ Bold
Period
k Period
Stock
Ordered
at0Period
System
w
d 1 System
d
)of+
cuk Cxat
xkDA
+k
u+
+Ck
33AB
51st
2Period
4 160.5
2k 11CD
C
k+1 = C
Stock
Ordered
k wPlay
Inventory
AB
AD
CD wk C0BD
AB
01st
1of
1.5
1,7ACD
0.5
1.5
0µat
2Period
10 07
51DB
8C
9BC
6
2
10
3
9Play
6Play
11Bold
2Ch.
ACB
CAB
CDA
System
x
=
fat
(x
,25Game
u
u
=
(x
µ
CAB
C
C
C
C
10
87C003CD
9
6
1
/
Timid
Play
1st
/1)1st
Bold
1A
pkw=
pwkpCA
Game
Timid
Play
Game
/pu
Play
p/x
pw
at
Period
k
Inventory
System
AC 5 C1st
CA0Game
CD
k+1
k1.5
kGame
k8
k0.5
k1.5
k ,)
kp
k
=SxAk + S
ukB+ kStock
wk Ordered
dk
d1uAB
d1µ
3
5
4System
6kFinite
20.5
3
5
2
40CB
62nd
xwk+
=
fCh.
(x
,CAD
w
)C
)d µpkwCD
w1k AB
x
kst xof
k+1Period
Horizon
/w
Timid
2nd
Game
Bold
Play
1cuFinite
0 Stock
0 010Horizon
12/2
1
1
0
k+1
kProblems
k
k2
k
k10(x
k
1
0
0
1.5
0.5
1
0.5
1.5
2
Period
k
Initial
State
AC
Problems
1
ACB
ACD
CAB
CAD
CDA
3
5
2
4
6
2
r(u
)
+
x
=
x
+
u
+
5 kPeriod
7 Ch.
83 139
9Stock
6k1 1at
2Period k Stock at Period
k u
k10 k+1
u
Demand
at +
Period
k5 Stock
atw2
Period
kk Stock
at
Period
5
7
8
6
2
Finite
Horizon
Problems
x
Demand
at
k
cukStock
x
=
x
u
+
k
w
3
2
4
6
k
k
CkDA
CAB
System
xk+1
=System
fkC(xAB
,w
=Stage
µkC(x
wk CxDB
k+1
kk + 1CAB
k
AD
CD
k , u.kC
k) u
k0) µC
kBD
k
Stochastic
Problems
at Period
InitialOrdered
State
A
C
AB
AC
CA
CD
ABC
ACB
ACD
CAD
CDA
ostk of
Period
k Stock
at
Period
k
Inventory
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0
2
System
xxk+1
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
x
10
5
7
8
3
9
6
1
2
k
k
k
k
k
k
k
k
k
k
.
System
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
x
k
+
1
kDo
at Period
k
+
1
Initial
State
A
C
AB
AC
CA
CD
ABC
k
kC
k Sk
kB1 C1
kAB
k CAC
k0Subproblem
k2 CCD
C
C
CAB
CRepair
C12
not Repair
1 2wkC
n CD10 nC
pk+1
p3
15 p02
0DB
0p10
1.5
0.5
BC2 Play
ADu +CkDA
11
1n
30BD
410
6=
2pAB
1)
73kACD
8x0.5
91)fu6
2k(x
111
0CD
0A8
132w
1.5
0.5
1,k1.5
Ch.
1st
Game
Play
1st
Bold
pd C1CD pd
cu
=
f51(n
(x
,u
,S2(n
)4
=
µCA
w
5
2
System
=
,µuProblems
w
)1C
uk0.5
=Game
µk CD
wCk CB
xABC
5Deterministic
6k6k/.(xTimid
211kCAD
ACB
CAB
k
kk+1
k9
k
kµ21.5
k k/)0CA
k
kx(x
k
SAk xSk+1
C xk + C
CCA
C0at
C27
10 C
5 CD
7System
8CBC
3 x9k+1
6CB
Stock
Period
k
+
Initial
State
A
AB
AC
Deterministic
Ch.
2kk)CDA
B
AC
ACB
ACDAB
CAD5State
CDA
Deterministic
Problems
2Horizon
Finite
11Problems
ockS
at Period
k CAB
+ 1 Initial
A
CHorizon
AB
AC
CA
CD
ABC
31Problems
52Perfect-State
2 Ch.
4CFinite
6Ch.
2Ch.
10
7
8
3
9
6
Problems
Ch.
1
S
C
C
C
C
C
C
Info
Ch.
3
Finite
Horizon
Problems
1
A
B
AB
AC
CA
CD
BC
CB
CD
Do
not
Repair
Repair
1
2
n
1
n
p
p
p1n p1
BPeriod
ACD kCAB
CAD
CDA 3
Stock
Ordered
at Period
System
System
xCk+1
=)CfukProblems
(x=kC
,atu(x
, w)Ch.
u1kkwC
=Inventory
µxk (xCk ) µSystem
22 44 kof
66xPeriod
22Inventory
Horizon
kPeriod
k C wk11 xk12
5 Cost
kFinite
Stock
Ordered
.kC)µ
p22 at
p2n
p2(n k1)+ 1p2(n
p(n3 51)(n
pf2nn
=
(x
,06
AB
DA
CD
BD
AB
Finite
1C
1) State
(n
1)n
k+1
k3
k=
k6
k, u
k0C)Ch.
k 1..u
k
k0.5
k)
Stochastic
0Problems
12C
0,µProblems
1.5
1 µDB
1kCCB
0.5
1.5
2
A
C
AC
CA
CD
ABC
31)
5CpAB
25ACB
6CD
10
7
8
1
5kHorizon
2+
4w
2AD
System
x3+k+1
f
(x
w
=
µ
(x
w
x
S
C
C
C
Do
Repair
1)System
2BC
n
1CB
n4Finite
pStochastic
p
p,B9u
p
p
k
k
k
k
k
k
k
k
k
A
AB
AC
CA
CD
BC
CD 0
xock
xC
+
u+
kAC Initial
wC
11S
12
1n
CSAB
CRepair
C
C
C
10
5
7
8
3
9
6
1
2
1(n
1)
2(n
1)
ACD
CAB
CAD
CDA
SPeriod
Cnot
C
CkCA
Cr(u
C
C
k+1
kAD
DA
CD
BD
DB
AB
A =
B
AB
CD
+
cu
x
=
x
u
k
w
Horizon
Problems
Ch.
1
k
k
k+1
k
Deterministic
Problems
Ch.
ACB
ACD
CAB
CAD
CDA
Stochastic
Problems
10
5k 7 Horizon
8 Problems
32 9Problems
6 1 2 Ch. 1
SBCAB
CAB
C
CCD C
AC
CD
BC
CB
CCAD
CCA
CBDC
CDB
Finite
DA CC
CD
AB
p22Problems
p2n Problems
pCh.
Ch. 1)2 p(n 1)(n 1) p(n 1)n pnn
Deterministic
Ch.
2(n
Finite
Problems
Ch.
Finite
Horizon
1 2 p2(n
Ch.1)
10
923 66
121
2
5Deterministic
4Deterministic
10
535Horizon
77Bold
88Problems
3
2216 2Problems
2nd
Game
/
Timid
Play
2nd
Game
/
Play
5
2
4
6
C
C
C
C
C
C
C
Deterministic
Problems
Ch.
2Ch.
Perfect-State
Info
System
=
fkC
(x
u23k1C
,w
) 1 uC
µk (x
w1(n
ACB
ACD
CAB
CAD
CDA
10
5
7
8
3
9
6
1
2
.. 6
AB
DA p2(n
CD
BD
AB
10
5Do
7
8CState
3CxCA
9k+1
1
2BD
C
C
C
kC
k
k =
k p
k x1)
k
not
Repair
Repair
1, BC
nDB
n
p11
p12k ) pµ
pkAD
p(nProblems
pBCh.
pAD
1n
DA
CD
Period
+ 1p2n
Initial
State
A
C1)DBStock
AB
AC
CD
ABC
22
nn
Finite
Horizon
1
3
5
2
4
6
2
1) p
2(n
1)(n
1)CA
(nAB
1)n
S
S
C
C
C
CCA
C
A
AB
AC
CD
CB AB
CD ABC
at
Period
k
+
1
Initial
A
C
AB
AC
CD
Deterministic
Problems
Ch.
2
CSAD
CAB
Perfect-State
Do
notCC
Repair
Repair
1CDB
2 CnCD
1 C
nBC
p11Problems
p12
p1n
pCh.
p2(n Ch.
Stochastic
DA
CD
CCA
C
CCD
1(n 3
1) Info
1) . 3
B
AB CC
AC CBD
CB
.. Ch.
Perfect-State
Info
Deterministic
Problems
2
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
Stochastic
Problems
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
Stochastic
Problems
Do
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
.
Stochastic
Problems
11
12
1n
1(n
1)
2(n
1)
Deterministic
Problems
Ch.
2
Finite
Horizon
Problems
Ch.
1
10
5pHorizon
8
31 29p4w66 Ch.
1 21
1st
/ Timid
Play
1st
Game
/ Bold
p..2
10
5
7Play
8
3pd1)9
6. p7pd1
wProblems
Finite
DoSGame
not Repair
Repair
1C
2CA
n Play
1CCD
n 2nd
p11
pCB
CAB
C
CpGame
C1Play
12 p1nC
1(n
2(n
1)
AACD
B
AC
BC
p52n
p2(n
2nd
Game
/ Timid
/Stochastic
Bold
Stochastic
22
CAB
CAD
CDA
1)
1) pC
1) p(n 1)n pnn
CACD
C
CnDB
AB
DA
2not
10
5
71)CD
33Problems
6CDA
1CBD
212 p22(n
ACB
CAB
CAD
Problems
Do
Repair
Repair
1 n(nAB
p1)(n
11 p12 p1n p1(n 1) p2(n 1)
Finite
Horizon
Problems
Ch.
119CCD
p22
pC
pCD
ppFinite
pAD
ot
Repair
1 2C
nDeterministic
11)C
n BD
p1)(n
pProblems
pCh.
Perfect-State
Info
Ch.
3..8C
CpAD
C
Cp1(n
2n Repair
nn
11
12DB
1n
2(n
2(n
2(n
(n
1) p
(n 1)
1)n
B
DA1)
AB
Horizon
Problems
Ch.
Stochastic
Problems
Finite
Horizon
Problems
Ch.
1
1st
Game
/3Timid
Finite
Horizon
Problems
Ch. 13Play 1st 1Game / Bold Play pd 1
Stochastic
Problems
p
p
p
p
p
p
p
Perfect-State
Info
Ch.
Perfect-State
Info
Ch.
22
2n
nn
Deterministic
Problems
Ch.
2
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
Perfect-State
Info
Ch.
3
p
p
p
p
p
p
p
1
Stochastic
Problems
Deterministic
Problems
2 / Bold Play
nn Deterministic
0 22 0C2n
1 2(n
0CDA
01) 12(n
1.5
1.5
0 2 2nd
1)
(n0.5
1)(n1 1) 1 (n0.5
1)n
10
5d 1p7/Ch.
8Ch.
3 p2w
9 6 2nd
1Ch.
2Game
C
CDB
CAB
.
AB
AD
CDCCD
BDCBC
Game
Timid
Play
1st
Play
1st
/ Bold
Play
pd Info
1Problems
pCh.
1 3C
C pGame
C/pCTimid
CCB
C
Perfect-State
AB
CD
p2nd
2nCp
SAGame
SBDo
C
C1)(n
Perfect-State
3w
2(n
1) AC
1) CA
(n
1)(n 2nd
1) p
(n
1)n
pRepair
pCh.
pC
pn
AB
CAInfo
CB 1)CCD
Stochastic
Problems
22CAC
2n Repair
nn p2(n 1) ..
Deterministic
Problems
2Ch.
Game
/2(n
Timid
Play
Game
/pnn
Bold
Play
2(n
1)
(n p
(n 1)n
not
1C..pCD
22(nn31)
1BC
pp1n
p1(n p1)
11 p12
Finite
Horizon
Problems
1Problems
Perfect-State
Info
Ch.
Finite
Horizon
Ch.
1
Deterministic
Problems
Ch.
2
not
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
Deterministic
Problems
Ch.
2
0
0
1
0
0
1
1.5
0.5
1
1
0.5
1.5
0 2
Deterministic
Problems
Ch.
2
11
12
1n
1(n
1)
2(n
1)
Solve
the
stage
0
subproblem
(using
the
solution
of
stage
1
subproblems)
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
Perfect-State
InfoPlay
Ch.Problems
3
2nd Game
/=Timid
2nd
Game
/(x
Bold
1 Ch.
Stochastic
Horizon
Problems
Ch.
1
System
x
f (xkGame
,Play
uk , w
) µ
k ) uPlay
k =µ
k Finite
k w
k xk1st
Perfect-State
Info
3 Play
Problems
Stochastic
Problems
Game
/C
Timid
/... Timid
1st1 Game / Bold Play pd 1 pd pw
0 k+10Play
1 k2nd
0 C0BD
1 /nCBold
1.5
0.5
1k pC12
1 p0.5
1.5
0Stochastic
2Game
CRepair
CAB
DA
CD
DB
AB
oAD
not
Repair
1
2
1
n
p
p
p
C
C
C
C
C
C
11
1n
2nd
Game
/
Timid
Play
2nd
Game
/
Bold
Play
AD
DA
CD
BD
DB
AB
1(n
1)
2(n
1)
1
Stochastic
1st GameThe
/ Timid
Play
Game
/Info
BoldProblems
pentire
pp1w2(n21 1) pw
ppPlay
p(n 1)(n 1) p(n 1)n pnn
Perfect-State
Ch.
3p2n
d p1
d Ch.
22the
stage
0 1st
subproblem
is
problem
2(n p1)
Deterministic
1st
Play 1)1st
/ Bold
Play
pProblems
p1st
p2(n/ Timid
p2(n
d 1 pDeterministic
d pStochastic
Problems
System
, uk ,1wk ) uk = µk (xk ) µk wk
Problems
Ch.
2nGame
nn
Stochastic
Problems
1)
(n1st
1)(n
1) Stochastic
1)n
1=2fk1(xkCh.
Game
/ Timid
PlaypGame
Game
/p(n
Bold
PlayProblems
pdwp 11 ppwdFinite
pwProblems
1Horizon
pw xk+1
ame / Timid Play 1st Game / Bold Play pdPerfect-State
1 pDeterministic
w 1
w
d pPerfect-State
Info
Ch.
3
Problems
2 1.5
Info
Ch.
3the
.
Perfect-State
Info
Ch.1 30.5cost
0
0
1
0
0Ch.
1optimal
1 J
1 ∗ 0.5
1.5 .0 2
System
x
=
f
(x
,
u
,
w
)
u
=
µ
(x
)
µ
w
x
The
optimal
value
the
stage
is
(initial
k+1
k
k
kof k
k
kp 0k subproblem
k
k1 . k
p
p
p
p
200p2n
nn
1st
Game
/
Timid
Play
1st
Game
/
Bold
Play
pd 1 state)
.pd pw 1 pw
2(n
1)
2(n
1)
(n
1)(n
1)
(n
1)n
Repair
Repair
1
2
n
1
n
p
p
p
p
p
.
Info
00 11 0 00 0 1 11.5 1.50.5 0.5
1 0.5
112 1.5
0.51n2nd
1.5
0Ch.
Game
/232(n
Timid
Play 2nd
Game
/ Bold Play
1 Perfect-State
111
0Repair
21(n
1)
1) 2
Do
not
Repair
1 nCh.
p11
p1(n
Stochastic
Problems
d 1Game
Timid
2nd
Game
Play
Problems
2 1) p2(n 1) .
Ch.1Deterministic
3 n Info
Perfect-State
3 p12 1p1nCh.
Problems
Perfect-State
Info
Ch.
3
0 0 00/1Construct
0 0Play
1.5
0.5 1.5
1/ Bold
1sequence
1.5Stochastic
0Info
2
1 1.5
0.51 1the
1 optimal
0.5
0Perfect-State
20.5
going
forward
1
Finite Horizon Problems Ch. 1
Stochastic
Problems
System xk+1 = fk (xk , uk , wk ) uk = µk (xk ) µk wk xk
x/k+1
==
fk f
(x
, uk,, 2nd
w
uk)=pu
µk/(x
w)kGame
dSystem
Game
Timid
Play
Bold
k(x
k )µ µ(x
k
1k 0Play
0 1st
1 1.5
0.5 1 Play
1 0.5pd 1.5
pSystem
pkPlay
ukk ), wGame
=
µxInfo
xDynamic
1st
Game
1 p0d p2w 1 pw15 / 24
nn
k+1
k
kp22
kp
k01)w/kCh.
2(n 1) xp
2(nBertsekas
1) kp
(nk
1)(n
1)
(n
1)n
(M.I.T.)
Approximate
pk2n
p0Timid
Perfect-State
31) pProgramming
1 p(n/ 1Bold
2(n
2(n
(n 1)(n 1)
1)n pnn
Principle of Optimality
Let π ∗ = {µ∗0 , µ∗1 , . . . , µ∗N−1 } be an optimal policy
Consider the “tail subproblem" whereby we are at xk at time k and wish to
minimize the “cost-to-go” from time k to time N
(
)
N−1
X
E gN (xN ) +
gm xm , µm (xm ), wm
m=k
Consider the “tail"
{µ∗k , µ∗k +1 , . . . , µ∗N−1 }
xk
0
of the optimal policy
Tail Subproblem
k
N
Time
THE TAIL OF AN OPTIMAL POLICY IS OPTIMAL FOR THE TAIL SUBPROBLEM
DP Algorithm
Start with the last tail (stage N − 1) subproblems
Solve the stage k tail subproblems, using the optimal costs-to-go of the stage
(k + 1) tail subproblems
The optimal value of the stage 0 subproblem is the optimal cost J ∗ (initial state)
In the process construct the optimal policy
Bertsekas (M.I.T.)
Approximate Dynamic Programming
16 / 24
Formal Statement of the DP Algorithm
Computes for all k and states xk : Jk (xk ): opt. cost of tail problem that starts at xk
Go backwards, k = N − 1, . . . , 0, using
Jk (xk ) =
min
uk ∈Uk (xk )
JN (xN ) = gN (xN )
n
o
Ewk gk (xk , uk , wk ) + Jk +1 fk (xk , uk , wk )
Interpretation: To solve a tail problem that starts at state xk
Minimize the (k th-stage cost + Opt. cost of the tail problem that starts at state xk +1 )
Notes:
J0 (x0 ) = J ∗ (x0 ): Cost generated at the last step, is equal to the optimal cost
Let µ∗k (xk ) minimize in the right side above for each xk and k . Then the policy
π ∗ = {µ∗0 , . . . , µ∗N−1 } is optimal
Proof by induction
Bertsekas (M.I.T.)
Approximate Dynamic Programming
18 / 24
Practical Difficulties of DP
The curse of dimensionality (too many values of xk )
In continuous-state problems:
I
I
Discretization needed
Exponential growth of the computation with the dimensions of the state and control
spaces
In naturally discrete/combinatorial problems: Quick explosion of the number of
states as the search space increases
Length of the horizon (what if it is infinite?)
The curse of modeling; we may not know exactly fk and P(xk | xk , uk )
It is often hard to construct an accurate math model of the problem
Sometimes a simulator of the system is easier to construct than a model
The problem data may not be known well in advance
A family of problems may be addressed. The data of the problem to be solved is
given with little advance notice
The problem data may change as the system is controlled – need for on-line
replanning and fast solution
Bertsekas (M.I.T.)
Approximate Dynamic Programming
19 / 24
Approximation in Value Space
A MAJOR IDEA: Cost Approximation
IF we knew Jk +1 , the computation of Jk would be much simpler
Replace Jk +1 by an approximation J˜k +1
Apply ūk that attains the minimum in
n
o
min E gk (xk , uk , wk ) + J˜k +1 fk (xk , uk , wk )
uk ∈Uk (xk )
This is called one-step lookahead; an extension is multistep lookahead
A variety of approximation approaches (and combinations thereoff):
Parametric cost-to-go approximation: Use as J˜k a parametric function J˜k (xk , rk )
(e.g., a neural network), whose parameter rk is “tuned" by some scheme
Problem approximation: Use J˜k derived from a related but simpler problem
Rollout: Use as J˜k the cost of some suboptimal policy, which is calculated either
analytically or by simulation
Bertsekas (M.I.T.)
Approximate Dynamic Programming
21 / 24
Approximation in Policy Space
ANOTHER MAJOR IDEA: Policy approximation
Parametrize the set of policies by a parameter vector r = (r0 , . . . , rN−1 ) (e.g., a neural
network);
π(r ) = µ̃0 (x0 , r0 ), . . . , µ̃N−1 (xN−1 , rN−1 )
Minimize the cost Jπ(r ) (x0 ) over r
A related possibility
Compute a set of many state-control pairs (xks , uks ), s = 1, . . . , q, such that for each
s, uks is a “good" control at state xks
Possibly use approximation in value space (or other “expert" scheme)
Approximate in policy space by solving for each k the least squares problem
min
rk
where
µ̃k (xks , rk )
q
X
s
uk − µ̃k (xks , rk )2
s=1
is an “approximation architecture"
A link between approximation in value and policy space
Bertsekas (M.I.T.)
Approximate Dynamic Programming
22 / 24
Perspective on Approximate DP
The connection of theory and algorithms (convergence, rate of convergence,
complexity, etc) is solid for exact DP and most of optimization
By contrast, for approximate DP, the connection of theory and algorithms is fragile
Some approximate DP algorithms have been able to solve impressively difficult
problems, yet we often do not fully understand why
There are success stories without theory
There is theory without success stories
The theory available is interesting but may involve some assumptions not always
satisfied in practice
The challenge is how to bring to bear the right mix from a broad array of methods
and theoretical ideas
Implementation is often an art; there are no guarantees of success
There is no safety in love, war, and approximate DP!
Bertsekas (M.I.T.)
Approximate Dynamic Programming
23 / 24

Download Report

A Series of Lectures on Approximate Dynamic Programming

Paperzz.com

Your Paperzz