Tutorial 3 Note that for the basic problem max E{∑ gk(xk,uk,wk)+gN

1
Tutorial 3
Note that for the basic problem
max E{
PN −1
µ0 ,..,µN −1
k=0
gk (xk , uk , wk ) + gN (xN )}, with maximization instead
of minimization, the DP algorithm is written as:
JN (xN ) = gN (xN ),
Jk (xk ) =
N
−1
X
max E{
uk ∈Uk (xk )
gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))},
k = N − 1, N − 2, ..., 1, 0
k=0
Question 3:
states = {running, broken}
control = {maintain, no maintenance, repair, replace}
4 weeks : 0, 1, 2, 3.
Terminal reward ? J4 (.) = 0
D.P algorithm : JN (xN ) = gN (xN )
Jk (xk ) = max{gk (xk , uk , wk ) + Jk+1 (f (xk , uk , wk ))}
uk
Week 3 :
J3 (running) =
max
)E[weekly profit]
u3 ∈
maintain
do not maintain
If u3 = maintain : E[weekly profit] = 0.6 × 100 − 20 = 40
(
If u3 = do not maintain : E[weekly profit] = 0.3 × 100 = 30
−→ J3 (running) = 40, and µ∗3 (running) = maintain.
J3 (broken) =
max


u3 ∈ repair


E[weekly


profit]
replace
If u3 = repair: E[profit] = 0.6 × 100 − 40 = 20

If u3 = replace: E[profit] = 100 − 50 = 50
−→ J3 (broken) = 20, µ∗3 (broken) = repair
Week 2 :
J3 (running) =
max
)E[weekly profit +J3 (x3 )]
u3 ∈
maintain
do not maintain
If u2 =maintain: 0.6 × (100 + J3 (running)) + 0.4(0 + J3 (broken)) − 20 = 72
(
If u2 =do not maintain: 0.3(100 + J3 (running)) + 0.7(0 + J3 (broken)) = 56
November 11, 2016
DRAFT
2
−→ J2 (running) = 72, µ∗2 = (running) = maintain
J2 (broken) =
max


u3 ∈ repair


E[weekly


profit + J3 (x3 )]
replace
If u2 =repair : 0.6 × (100 + J3 (running)) + 0.4(0 + J3 (broken)) − 40 = 52

If u2 =replace : 100 + J3 (running) − 150 = −10
−→ J2 (broken) = 52, µ∗2 = (broken) = repair
Week 1 :
Similar calculations give:
J1 (running) = 104, µ∗1 (running) = maintain
J1 (broken) = 84, µ∗1 (broken) = repair
Week 0 :
New machine
J0 (new machine) = 100 + J1 (running) = 204.
Summary of optimal policy:
If running, always maintain.
If broken, always replace.
DRAFT
November 11, 2016