1 Tutorial 3 Note that for the basic problem max E{ PN −1 µ0 ,..,µN −1 k=0 gk (xk , uk , wk ) + gN (xN )}, with maximization instead of minimization, the DP algorithm is written as: JN (xN ) = gN (xN ), Jk (xk ) = N −1 X max E{ uk ∈Uk (xk ) gk (xk , uk , wk ) + Jk+1 (fk (xk , uk , wk ))}, k = N − 1, N − 2, ..., 1, 0 k=0 Question 3: states = {running, broken} control = {maintain, no maintenance, repair, replace} 4 weeks : 0, 1, 2, 3. Terminal reward ? J4 (.) = 0 D.P algorithm : JN (xN ) = gN (xN ) Jk (xk ) = max{gk (xk , uk , wk ) + Jk+1 (f (xk , uk , wk ))} uk Week 3 : J3 (running) = max )E[weekly profit] u3 ∈ maintain do not maintain If u3 = maintain : E[weekly profit] = 0.6 × 100 − 20 = 40 ( If u3 = do not maintain : E[weekly profit] = 0.3 × 100 = 30 −→ J3 (running) = 40, and µ∗3 (running) = maintain. J3 (broken) = max u3 ∈ repair E[weekly profit] replace If u3 = repair: E[profit] = 0.6 × 100 − 40 = 20 If u3 = replace: E[profit] = 100 − 50 = 50 −→ J3 (broken) = 20, µ∗3 (broken) = repair Week 2 : J3 (running) = max )E[weekly profit +J3 (x3 )] u3 ∈ maintain do not maintain If u2 =maintain: 0.6 × (100 + J3 (running)) + 0.4(0 + J3 (broken)) − 20 = 72 ( If u2 =do not maintain: 0.3(100 + J3 (running)) + 0.7(0 + J3 (broken)) = 56 November 11, 2016 DRAFT 2 −→ J2 (running) = 72, µ∗2 = (running) = maintain J2 (broken) = max u3 ∈ repair E[weekly profit + J3 (x3 )] replace If u2 =repair : 0.6 × (100 + J3 (running)) + 0.4(0 + J3 (broken)) − 40 = 52 If u2 =replace : 100 + J3 (running) − 150 = −10 −→ J2 (broken) = 52, µ∗2 = (broken) = repair Week 1 : Similar calculations give: J1 (running) = 104, µ∗1 (running) = maintain J1 (broken) = 84, µ∗1 (broken) = repair Week 0 : New machine J0 (new machine) = 100 + J1 (running) = 204. Summary of optimal policy: If running, always maintain. If broken, always replace. DRAFT November 11, 2016
© Copyright 2024 Paperzz