UvA-DARE (Digital Academic Repository) Explorations in efficient reinforcement learning Wiering, M. Link to publication Citation for published version (APA): Wiering, M. (1999). Explorations in efficient reinforcement learning General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl) Download date: 31 Jul 2017 Appendix B Learning rate Annealing Consider the following conditions on the learning rate at which depends on t. Here we use t to denote the number of visits of a specific state/action pair. !• E(SlQ( = °° 2. Eel «? < oo T h e reason for the first condition is t h a t by updating with learning rate at, each (finite) distance between the optimal parameter-setting and the initial parameters can be overcome. T h e reason for the second condition is t h a t the variance goes to zero, so t h a t convergence in the limit will take place (otherwise our solution will be inside a ball with the size of the variance containing the optimal parameter-setting, b u t the true single-point solution will be unknown). We want to choose a function ƒ(*) and set at = f(t). In (Bertsekas and Tsitsiklis, 1996) ƒ (t) = j is proposed. Here we propose to use more general functions of type : / ( f ) = jy, with ß > 0, and will proof t h a t for | < ß < 1, the two conditions on the learning rate are satisfied. T h e function is a step function, since a sum is used. In the following we use a continuous function in order to make it easy to compute ß. Since there is a mismatch between the continuous function and the step function, we use two functions, one which returns a smaller value t h a n the sum (the lower bound function), and another which returns a larger value (the higher bound function). For the first condition we use a lower bound on the sum: r°° l £ tß Lit? l-ß dt i°° l-ß From this (and from dealing with the special case /3 = 1 separately) follows : ß < 1. For the second condition we use a higher bound on the sum: i £W < ï ï £a(t-l)" 171 dt APPENDIX 172 = 1+ < oo It- B. LEARNING RATE ANNEALING l)1-213 1-2/3 From this follows : ß > ^Hence we can use the following family of methods for decreasing the learning rate: ƒ (t) = -4 with i < ß < 1. Since in general we want to choose the learning rate as high as possible, ß should be chosen close to i . In practical simulations often constant learning rates are used (i.e. /3 = 0), b u t this may prohibit convergence.
© Copyright 2025 Paperzz