Modeling Long Term Care and Supportive Housing

Modeling Long Term Care
and Supportive Housing
Marisela Mainegra Hing
Telfer School of Management
University of Ottawa
Canadian Operational Research Society,
May 18, 2011
Outline
 Long Term Care and Supportive Housing
 Queueing Models
 Dynamic Programming Model
 Approximate Dynamic Programming
LTC problem
λRC
λC
Community
LTC
λH
μLTC , CLTC
Hospital
λRH
Goal:
Hospital level below a given threshold
Community waiting times below 90 days
LTC previous results
 MDP model determined a threshold policy
for the Hospital but it did not take into
account community demands
 Simulation Model determined that current
capacity is insufficient to achieve the goal
Queueing Model
λRC
λC
Community
λC-LTC
λLTC
LTC
λH
Hospital
λH-LTC
μLTC , CLTC
λRH
H_reneg
e
μRH,
Station LTC: M/M/CLTC
Station H_renege: M/M/∞
Queueing Model
Station LTC: M/M/CLTC
steady state:
LTC
 LTC 
 LTC CLTC
<1
The probability that no patients are in the system:

p0  

 /  
 /  

1
 CLTC  



c!
CLTC !  CLTC    
c 0
The average number of patients in the waiting line: LqLTC 
CLTC 1
c
CLTC
The average time a client spends in the waiting line:
 /  C 
CLTC  1!CLTC    2
WqLTC 
LTC
LqLTC
LTC
The number of patients from the Hospital that are in the queue for LTC (LqH-LTC).
LqH  LTC
H  LTC

LqLTC
LTC
p0
Queueing Model
Station H_renege: M/M/∞
 The average number of patients in the
system is
LRH
RH

 RH
Queueing Model
Data analysis
 Data on all hospital demand arriving to the CCAC from
April 1st, 2006 to May 15th, 2009.
 ρLTC = 1.6269 for current capacity CLTC= 4530
 To have ρLTC < 1 we need CLTC> 7370.08, 2841
(62.71%) more beds than the current capacity.
 With CLTC > 7370 we apply the formulas.
 Given a threshold T for the hospital patients and the
number LqLTC of total patients waiting to go to LTC, what
we want is to determine the capacity CLTC in LTC such
as:
LqLTC
LTC
 T  LRH 
H  LTC
Queueing Model
Results
 19 iterations of capacity values
 Goal achieved with capacity 7389, the
average waiting time is 31 days and the
average amount of Hospital patients
waiting in the queue is 130 ( T=134) .
 This required capacity is 2859 (63.1%)
more than the current capacity.
Queueing Model with SH
λH
Hospital
λH-SH
λRH
H_renege
λH-LTC
μRH,
λSH-LTC
SH
μSH , CSH
λC-SH
λC
Community
LTC
λC-LTC
λRC
μLTC , CLTC
Queueing Model with SH
Results
 Required capacity in LTC is 6835, 2305
(50.883%) more beds than the current capacity
(4530).
 Required capacity in SH is 1169.
 With capacity values at LTC: 6835 and at SH:
1169 there are 133.9943 (T= 134) Hospital
Patients waiting for care (for LTC: 110.3546,
reneging: 22.7475, for SH: 0.89229), and
Community Patients wait for care in average
(days) at LTC: 34.8799, and at SH: 3.2433.
Semi-MDP Model
State space: S = {(DH_LTC, DH_SH, DC_LTC, DC_SH, DSH_LTC, CLTC, CSH, p) }
Action space: A = {0,..,max(TCLTC,TCSH)}
Transition time: d(s,a) =
0, p  1,2

 1, p  3
Transition probabilities: Pr(s,a,s’) =
Immediate reward: r(s,a) =
1, p  1,2

 5
2
 Pr( x ) Pr( y ), p  3
i 
j

i 1
j 1
0, p  1,2





C _ LTC  
C _ LSH  

 D

 DH _ SH  T  C  DC _ LTC 
   DC _ SH 
 ,p3
 H H _ LTC

WT  
WT  





Optimal Criterion: Total expected discounted reward
Approximate Dynamic programming
Goal
find π : S
A that maximizes the state-action value function
 

Tk

Q (s,a) = E   rk | s0  s , a0  a 
 k 0

 r ( s, a )  γ d ( s , a )  ps' max Q (s' , a ' )
γ: discount factor
s'
a'
Bellman: there exists Q* optimal: Q* =maxQ(s,a) and the
optimal policy π*
 * (s)  arg max Q* (s, a)
a
Reinforcement Learning
Reinforcement
state
action
RL: environment
ENVIROMMENT
state
action
transition probabilities
reward function
next state, immediate reward
RL: Agent
Knowledge: Q(s,a)
state
Behavior
reward
action  arg max Q( s, a)
a
action
exploratory
Learning:
update Q-values
Knowledge representation (FA)
Learning method
•Backup table
•Watkins QL
•Neural network
•Sarsa ()
•...
•...
QL: parameters
 θ: number of hidden neurons.
 T: number of iterations of the learning process.
 0: initial value of the learning rate.
 0: initial value of the exploration rate.
 Learning-rate decreasing function.
 (t ) 
0
t
1
T
t  1..T
 Exploration-rate decreasing function.
 (t ) 
0
t
1
T
t  1..T
QL: algorithm
(T,
θ
T
exploration vs/ exploitation
Learning and exploration
rates
QL: tuning parameters
(observed regularities)
1. (θ, )-scheme: T= 104, 0 = 10-3, 0 =1, T = 103,
T=v103 , v[1,.. ]. PR(θ, ): best performance
with (θ, )-scheme
2. PR(θ, ) monotically increase respect  until certain
value  (θ)
3. PR(θ, ) monotically increase respect θ until certain
value θ()
4. (θ) and θ() depend on the problem instance
QL: tuning parameters
(methodology: learning schedule given PRHeu)
1.
2.
∆θ =50, θ =0, PRθ =0, =0, vbest=1,
While PRθ <PRheu or no-stop
1. θ = θ + ∆θ, PRbest=0
2. While PRbest ≥ PR
1. = +1, T= 104, T = 103
2. PR=PRbest,
3. For v= vbest to 
• T=v103
• PR[v]=Q-Learning(T, θ, 10-3, 1, T , T)
4. [PRbest,vbest]=max(PR)
3. PRθ = PR
Discussion
 For given capacities solve the SMDP with
QL
 Model other LTC complexities:
• different facilities and room accommodations,
• client choice and
• level of care
Thank you for your attention
 Questions?
Neural Network for Q(s,a)
s
.
.
.
.
a
.
.
Q(s,a)