Appendix B: Learning rate Annealing

UvA-DARE (Digital Academic Repository)
Explorations in efficient reinforcement learning
Wiering, M.
Link to publication
Citation for published version (APA):
Wiering, M. (1999). Explorations in efficient reinforcement learning
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s),
other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating
your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask
the Library: http://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam,
The Netherlands. You will be contacted as soon as possible.
UvA-DARE is a service provided by the library of the University of Amsterdam (http://dare.uva.nl)
Download date: 31 Jul 2017
Appendix B
Learning rate Annealing
Consider the following conditions on the learning rate at which depends on t. Here we use t
to denote the number of visits of a specific state/action pair.
!• E(SlQ( = °°
2. Eel «? < oo
T h e reason for the first condition is t h a t by updating with learning rate at, each (finite)
distance between the optimal parameter-setting and the initial parameters can be overcome.
T h e reason for the second condition is t h a t the variance goes to zero, so t h a t convergence
in the limit will take place (otherwise our solution will be inside a ball with the size of the
variance containing the optimal parameter-setting, b u t the true single-point solution will be
unknown). We want to choose a function ƒ(*) and set at = f(t). In (Bertsekas and Tsitsiklis,
1996) ƒ (t) = j is proposed. Here we propose to use more general functions of type : / ( f ) = jy,
with ß > 0, and will proof t h a t for | < ß < 1, the two conditions on the learning rate are
satisfied.
T h e function is a step function, since a sum is used. In the following we use a continuous
function in order to make it easy to compute ß. Since there is a mismatch between the
continuous function and the step function, we use two functions, one which returns a smaller
value t h a n the sum (the lower bound function), and another which returns a larger value (the
higher bound function).
For the first condition we use a lower bound on the sum:
r°° l
£ tß
Lit?
l-ß
dt
i°°
l-ß
From this (and from dealing with the special case /3 = 1 separately) follows : ß < 1.
For the second condition we use a higher bound on the sum:
i
£W
<
ï
ï
£a(t-l)"
171
dt
APPENDIX
172
=
1+
<
oo
It-
B. LEARNING
RATE
ANNEALING
l)1-213
1-2/3
From this follows : ß > ^Hence we can use the following family of methods for decreasing the learning rate: ƒ (t) =
-4 with i < ß < 1. Since in general we want to choose the learning rate as high as possible,
ß should be chosen close to i . In practical simulations often constant learning rates are used
(i.e. /3 = 0), b u t this may prohibit convergence.