About the latest complexity bounds for Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015 Policy Iteration to solve Markov Decision Processes Order-Regular matrices: a powerful tool for the analysis starting state How much will we pay in the long run? starting state How much will we pay in the long run? cost vector starting state How much will we pay in the long run? discount factor Markov chains Markov Decision Processes Markov chains transition probability action cost action Goal: find the optimal policy The value of a policy = the long term costs of the corresponding Markov chain what we aim for ! Proposition: there always exists How do we solve a Markov Decision Process ? Policy Iteration POLICY ITERATION POLICY ITERATION 0. Choose an initial policy while 1. Evaluate compute 2. Improve is the best action in each state based on end POLICY ITERATION 0. Choose an initial policy while 1. Evaluate compute 2. Improve is the best action in each state based on end POLICY ITERATION 0. Choose an initial policy while 1. Evaluate compute 2. Improve is the best action in each state based on end Stop ! We found the optimal policy Bad news: Policy Iteration has exponential complexity At least in general… [Fearnley 2010, Friedmann 2009, H. et al. 2012] But we still aim for upper bounds… Policy Iteration needs at most iterations Policy Iteration needs at most [Mansour & Singh 1999] iterations Policy Iteration needs at most [H. et al. 2014] not possible to improve using « standard » tools iterations Can we do even better? The matrix is “Order-Regular” The matrix is “Order-Regular” The matrix is “Order-Regular” How large are the largest Order-Regular matrices that we can build? The answer of exhaustive search ?? Conjecture (Hansen & Zwick, 2012) the golden ratio the Fibonacci number The answer of exhaustive search Theorem (H. et al., 2014) for (Proof: a “smart” exhaustive search) How large are the largest Order-Regular matrices that we can build? A constructive approach A constructive approach A constructive approach Iterate and build matrices of size Can we do better ? Yes! We can build matrices of size So, what do we know about Order-Regular matrices ? So, what do we know about Order-Regular matrices ? currently the best bounds for MDPs For papers and slides perso.uclouvain.be/romain.hollanders/ …and much more About the latest complexity bounds for Policy Iteration Romain Hollanders, UCLouvain Joint work with: Balázs Gerencsér, Jean-Charles Delvenne and Raphaël Jungers Benelux Meeting in Systems and Control 2015
© Copyright 2026 Paperzz