“Order-Regular” The matrix is

About the latest complexity bounds for Policy Iteration
Romain Hollanders, UCLouvain
Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers
Benelux Meeting in Systems and Control 2015
Policy Iteration to solve Markov Decision Processes
Order-Regular matrices: a powerful tool for the analysis
starting state
How much will we pay in the long run?
starting state
How much will we pay in the long run?
cost vector
starting state
How much will we pay in the long run?
discount factor
Markov chains
Markov Decision Processes
Markov chains
transition probability
action cost
action
Goal: find the optimal policy
The value of a policy = the long term costs of the corresponding Markov chain
what we aim for !
Proposition: there always exists
How do we solve a Markov Decision Process ?
Policy Iteration
POLICY ITERATION
POLICY ITERATION
0. Choose an initial policy
while
1. Evaluate
compute
2. Improve
is the best action in each state
based on
end
POLICY ITERATION
0. Choose an initial policy
while
1. Evaluate
compute
2. Improve
is the best action in each state
based on
end
POLICY ITERATION
0. Choose an initial policy
while
1. Evaluate
compute
2. Improve
is the best action in each state
based on
end
Stop ! We found the optimal policy
Bad news:
Policy Iteration has exponential complexity
At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012]
But we still aim for upper bounds…
Policy Iteration needs at most
iterations
Policy Iteration needs at most
[Mansour & Singh
1999]
iterations
Policy Iteration needs at most
[H. et al.
2014]
not possible to improve
using « standard » tools
iterations
Can we do even better?
The matrix is “Order-Regular”
The matrix is “Order-Regular”
The matrix is “Order-Regular”
How large are the largest Order-Regular matrices
that we can build?
The answer of exhaustive search
??
Conjecture (Hansen & Zwick, 2012)
the golden ratio
the
Fibonacci number
The answer of exhaustive search
Theorem (H. et al., 2014)
for
(Proof: a “smart” exhaustive search)
How large are the largest Order-Regular matrices
that we can build?
A constructive approach
A constructive approach
A constructive approach
Iterate and build matrices of size
Can we do better ?
Yes!
We can build matrices of size
So, what do we know about Order-Regular matrices ?
So, what do we know about Order-Regular matrices ?
currently the best bounds for MDPs
For papers and slides
perso.uclouvain.be/romain.hollanders/
…and much more
About the latest complexity bounds for Policy Iteration
Romain Hollanders, UCLouvain
Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers
Benelux Meeting in Systems and Control 2015