Markov Decision Process - Solving by Linear Programming

OR II
GSLM 52800
1
Outline
 introduction
 problem
statement
 long-term
 solving
to discrete-time Markov Chain
average cost pre unit time
the MDP by linear programming
2
States of a Machine
State
Condition
0
1
2
3
Good as new
Operable – minor deterioration
Operable – major deterioration
Inoperable – output of unacceptable quality
3
Transition of States
State
0
1
2
3
0
0
7/8
1/16
1/16
1
0
3/4
1/8
1/8
2
0
0
1/2
1/2
3
0
0
0
1
4
Possible Actions
Decision
Action
Relevant States
1
Do nothing
0, 1, 2
2
3
Overhaul (return to
state 1)
Replace (return to
state 0)
2
1, 2, 3
5
Problem
 adopting
different collections of actions
leading to different long-term average cost
per unit time
 problem:
to find the policy that minimizes
the long-term average cost per unit time
6
Costs of Problem
 cost
of defective items
 state
 cost
0: 0; state 1: 1000; state 2: 3000
of replacing the machine = 4000
 cost
of losing production in machine
replacement = 2000
 cost
of overhauling (at state 2) = 2000
7
Policy Rd: Always Replace
When State  0
 half
of the time at state 0, with cost 0
 half
of the time at other states, all with cost
6000, because of machine replacement
 average
cost per unit time = 3000
7/8
0
1
1/16
1
1
1/16
1
3
2
8
Long-Term Average Cost of
a Positive, Irreducible Discrete-time Markov Chain
a
positive, irreducible discrete-time Markov
chain with M+1 states, 0, …, M
 only
M of the balance eqt plus the
normalization eqt
M
balance eqt.:  j   i pij , j  0,..., M
i 0
M
normalization eqt.:  i  1
i 0
9
Policy Ra: Replace at Failure
but Otherwise Do Nothing
0  3
1  78 0  34 1
3/4
1  1 1
2  16
0 8 1 2 2
1  1 1
3  16
0 8 1 2 2
0  1  2  3  1
2 ;  7 ;  2 ;  2
0  13
1 13 2 13 3 13
7/8
0
1
1
1/8
1/16
1
1/16
1/8
1/2
3
1/2
2
(0)0  10001  30002  60003  1923
10
Policy Rb: Replace in State 3,
and Overhaul in State 2
0  3
1  78 0  34 1  2
3/4
1  1
2  16
0 8 1
1  1
3  16
0 8 1
0
2 ;
21 1
 75 ; 2 
1
1
1/8
0  1  2  3  1
0 
7/8
2 ;
21 3
1/16
1
1/16

2
21
3
1 1/8
2
(0)0  10001  40002  60003  1667
11
Policy Rc: Replace
in States 2 and 3
 0   2  3
1  78 0  34 1
3/4
1  1
2  16
0 8 1
1  1
3  16
0 8 1
0
7 ;
 11
2
1
1
1/8
0  1  2  3  1
2 ;
0  11
1
7/8
1 ;
 11
3
1
1
 11
1/16
1
1 1/8
1/16
3
2
(0)0  10001  60002  60003  1727
12
Problem
 in
this case the minimum-cost policy is Rb,
i.e., replacing in State 3 and overhauling in
State 2
 question:
Is there any efficient way to find
the minimum cost policy if there are many
states and different types of actions?
 impossible
to check all possible cases
13
Linear Programming Approach
for an MDP
 let
 Dik
be the probability of adopting decision k at
state i
 i
be the stationary probability of state i
 yik
= P(state i and decision k)
 Cik
= the cost of adopting decision k at state i
14
Linear Programming Approach
for an MDP
yik  i Dik , i  0,..., M
M
 j   i pij , j  0,..., M
M K
i 0
  yik  1
M
 i  1
i 0 k 1
K
M K
k 1
i 0 k 1
i 0
 y jk    yik pij (k ), j  0,1,..., M
yik  0, i  0,1,..., M ; k  1,..., K
M K
M K
i 0 k 1
i 0 k 1
E (C )    i Cik Dik =   Cik yik
15
Linear Programming Approach
for an MDP
M K
min Z =   Cik yik ,
i 0 k 1
s.t.
M K
  yik  1,
i 0 k 1
K
 y jk
k 1
at optimal, Dik = 0 or 1, i.e., a
deterministic policy is used
M K
   yik pij (k )  0, j  0,1,..., M
yik  0,
i 0 k 1
i  0,1,..., M ; k  0,1,..., K
16
Linear Programming Approach
for an MDP
 actions
possibly to adopt at state
 0:
do nothing (i.e., k = 1)
 1:
do nothing or replace (i.e., k = 1 or 3)
 2:
do nothing, overhaul, or replace (i.e., k = 1, 2,
or 3)
 3:
replace (i.e., k = 3)
 variables: y01, y11, y13, y21, y22, y23,
and y33
17
Linear Programming Approach
for an MDP
min Z =1000y11  6000y13 +3000y21
State
0
1
2
3
0
0
7/8
1/16
1/16
1
0
3/4
1/8
1/8
2
0
0
1/2
1/2
3
0
0
0
1
+4000y22 +6000y23 +6000y33 ,
s.t.
y01  y11  y13 +y21 +y22 +y23 +y33  1
y01  ( y13 +y23 +y33 )  0
y11 +y13  ( 78 y01 + 34 y11 +y22 )  0
1 y +1 y +1 y )0
y21 +y22 +y23  ( 16
01 8 11 2 21
1 y +1 y +1 y )0
y33  ( 16
01 8 11 2 21
yik  0,
i  0,1,..., M ; k  0,1,..., K
18
Linear Programming Approach
for an MDP
 solving, y01
= 2/21, y11 = 5/7, y13 = 0, y21 = 0,
y22 = 2/21, y23 = 0, y33 = 2/21
 optimal
 at
policy
state 0: do nothing
 state
1: do nothing
 state
2: overhaul
 state
3: replace
19