Ruppert, David.Almost Sure Aproxmiations to the Robbins-Monro and Kiefer-Wolfowitz Processes with Dependent Noise."

Almost Sure Approximations to the Robbins-Monro and
Kiefer-Wolfowitz Processes with Dependent Noise
David Ruppert l
Abbreviated Title:
RM and KW Processes
We study a recursive algorithm which includes the multidimensional
Robbins-Monro and Kiefer-Wolfowitz processes.
The assumptions on the distur-
bances are weaker than the usual assumption that they be a martingale difference
sequence.
It is shown that the algorithm
average of the disturbances.
can be represented as a weighted
This representation can be used to prove asymp-
totic results for stochastic approximation procedures.
As an example, we
approximate the one dimensional Kiefer-Wolfowitz process almost surely by
Brownian motion, and as a byproduct obtain a law of the iterated logarithm.
Key Words and Phrases:
Stochastic approximation, Robbins-Monro process,
. Kiefer-Wolfowitz process, dependent random variables,
almost sure invariance principle.
lResearch supported by National Science Foundation, Grant NSF MCS78-0l240.
AMS 1970 subject classification.
-e
Primary 62L20; Secondary 60F1S.
1.
Introduction.
The recursive algorithm
Xn+ 1
(1.1)
where e
n
and
= Xn
- a n (f(Xn )
+
en +
Bn are random vectors
Bn ),
n
= 1,2, •.• ,
converges to 0 almost surely, f is
a function (possibly unknown) from
and an is a positive random variable,
has been studied by Kushner (1977) and Ljung (1978).
They have shown that (1.1)
includes the Robbins-Monro (RM) (1951) and Kiefer-Wolfowitz (KW) (1952) stochastic approximation processes, which are methods for loca.ting roots of
f(x)
(1. 2)
=
o.
k
With the KW process, our goal is to locate a point in R where the unknown,
real valued function V attains a local maximum. It is assumed that for each x in
k
R we can observe not Vex), but rather Vex) plus additive noise. To locate such
a point, we look for a solution to (1.2) with f equal to the gradient of V.
~
The
KW algorithm is given by (1.1) with Sn equal to the error which results from
approximating f by finite differences of values of V, and en equal to a function
of the random errors added to the observations of V used to estimate these
differences.
With the RM process, for any x in Rk we can observe f(x) plus additive
noise, and the goal is to find
form (1.1) with
~
a solution to (1.2).
The RM algorithm is of the
equal to 0 and en equal to the noise added to the observation
of f(x).
Most of the classical results for the RM and KW processes require the
assumption
(1. 3)
e-
2
~
Wasan (1969) discusses many of these results and has extensive bibliography.
We mention several of the major results.
Blum (1954) gives sufficient condi-
tions for Xn to converge almost surely to a solution 6 of (1,2).
Chung (1954)
studies the asymptotic behavior of the moments of (Xn -8) and uses the method of
moments to prove that (Xn-6), suitably normalized, converges weakly to a normal
distribution. Sacks (1958) proves asymptotic normality by other methods and
under weaker conditions.
Fabian (1968) proves a theorem which subsumes much of
the earlier work on asymptotic normality.
More recently, the asymptotic behavior of Xn has been further elucidated.
McLeish (1976), Neve1'son and Has'minskii ((1973), page 153), and Walk (1977)
prove weak invariance principles for RM processes; the latter treats RM processes
taking values in a separable real Hilbert space.
Also, laws of the iterated
logarithm have appeared (e.g., Gaposkin and Krasu1ina (1974) and Major (1973)).
Kersting (1977) approximates the one-dimensional RM process almost surely
by a weighted sum of i.i.d. random variables.
This approximation allows results
for i.i.d. random variables to be applied in a straightforward manner to the RM
process, and many of the results mentioned above are simple corollaries of
Kersting's representation.
It should be mentioned again that the above results all assume (1.3).
Ljung(1978) states that in many applications assumption (1.3) is violated because
the disturbances en are correlated.
He, Kushner (1977), and Kushner and
Clark (1978) have weakened (1.3) considerably.
sure convergence, not the speed of convergence.
for weak convergence.
Ljung establishes only almost
Kushner and Clark do prove rates
They define a continuous time process by piecewise constant
interpolation of nY(Xn -6), where Y depends upon a n , 8n , and f.
·e
They show that
with a suitable translation of the time variable, with the translation depending
upon n, this process converges weakly in (D[O,oo))k to the stationary solution of
a stochastic differential equation.
3
In this paper, (1.3) is not assumed.
Using techniques of Kersting (1977),
~
we approximate X almost surely as a weighted average of el, ... ,en . Results
n
for X then follow immediately as corollaries of theorems for sums of dependent
n
random variables.
We work with a form of (1.1) which is sufficiently general to
include the multi-dimensional RM and KW processes.
Kersting's work is confined
to the one-dimensional RM process.
The methods of this paper are different than those of Kushner and Clark.
They deal with certain measures induced by Xn and prove weak convergence of these
measures.
We examine the sample paths of Xn and prove strong limit theorems.
In section 2 we introduce the basic model.
results.
Section 3 presents general
These results and work of Philipp and Stout (1975) are used in section
4 to show that the RW and KW processes can be redefined on richer probability
spaces, without changing their distributions, so that they are approximated almost
surely by Brownian motion.
~
Because the approximation is almost sure and is
sufficiently close, it is easy to show that results about the asymptotic flucuation behavior of Brownian motion hold also
for the RM and KW processes.
For
example corollary 4.1 is law of the iterated logarithm for the KW process; it
follows directly from the law of the iterated logarithm for Brownian motion.
2.
Notation and assumptions.
coordinate,
If x is a vector in Rk , let xCi) be its ith
Ilxll = (}:~=l (x(i))2) , and Ilxll oo = max{I)!:(j)I: j = l, ... ,kL
A is a matrix, let A(ij) be its i,jth entry and AT be its transpose.
the kxk identity matrix.
Let I
k
All relations between random variables are meant to
If
be
hold almost surely.
The following assumptions define our basic model, which is a special case
of algorithm (1.1).
~.
4
AI.
k
Suppose 0 < T < 1/4, Xl is in R , and
Xn+l
= Xn
k
for n > 1, where en , and 8n are random vectors in R .
A2. Suppose 8 is a vector in Rk and 8n ~ 8.
A3.
Let D be a kxk symmetric positive definite matrix and
f(x)
~
Let Ak
A4.
Ak_l
~
...
~
= Dx
+ 0(1 Ixl I l +n)
o.
Supose that for every 0 > 0,
~
A6.
Suppose
Al be the eigenvalues of D.
Ln=l
AS .
as x ~
n > o.
Assume Xn ~ O.
Let p > 0 and assume 8n
Remarks.
n-(~+o) en converges.
= 8 + O(n- p).
Assumption A4 holds, for example, when en is a martingale -
difference sequence with uniformly bounded second moments, and McLeish (1975)
has theorems which imply A4under a variety of mixing plus moment conditions.
See, especially, his theorem 2.10.
If (1.2) has a unique solution, then for convenience we take it to be 0
and under conditions given by Ljung (1978) assumption AS holds.
Nl.
Define y
= min(2T,
~-T)
or
~-T
according as P (8n of 0 infinitely often) > 0
or not.
Let P be an orthogonal matrix such that P'DP
N2. For any x in Rk let
-e
x=
P'x.
= diag(A.).
1
S
3.
General Results.
Lemma 3.1.
Proof.
Assume Al to AS.
Define Y = (n-l)OX.
n
n
(n(n-l)
Suppose
6 < Y < AI.
2T
-+
O.
Since
N2
that
where the matrix Bn satisfies Bn = 0(1 IXnl In) = 0(1).
2
n
-1 0
-1-2
) = 1 + 0 n
+ O(n )
it follows from AI, A3, AS, and
o < y
o
Then n' X
-
Since 8 = 0(1) and
n
if Peen ~ 0 i.o.) > 0,
00
\
-l+o-2T
n
en converges..
n=l
l.
By A4, since 0 < y
2
~-T
00
\
-1+0+T
n
en converges.
n=l
l.
Therefore, if we set A.1 = A.1 - 0,
yn+l
(3.1)
=Y
n
n- l (diag(A.) + B)Y + d
1
n n
n
where L:=l dn converges. Let Q* be the subset of the probability space on which
B -+ 0 and ~ d converges.
Thus P Q* = 1. Fix w :in Q* and until the end of
n
In=l n
the proof write X instead of X(w) for any random
vari~)le
X.
To complete the
proof we need only show that
lim sup llYn II <
(3.2)
Choose
E
> 0, and then choose N such that
00
•
e-
6
(3.3)
sup
n>l
II
N+n
r dk II <
sup{IB~:~)I: n ~ 1 and i,j
(3.4)
E
k=N
and
= 1, •.. ,k}
< A1/(2k)
If (3.2) does not hold then there exists j, M, n , and n 2 such that N < n 1 < n 2,
1
M>
E ,
(3.5)
and
(3.6)
We treat the case where y(j) > 0; the other case is similar.
n
_(j)
m = SUp{1 < n 2 : Yi
o
< M}.
2
Then m ~ n1 .
By (3.1), (3.3), (3.5), and (3.6),
n -1
yn(j)
2
= y(j)
m+1
-
2
n -1
fi=m d~j)
r
1=m+ 1
0
Let
+ E.
Therefore by (3.4)
which contradicts (3.5) since y(j)
n
o
> O.
2
Define o(x,y) to equal 0 or 1 according as x # y or x = y.
Theorem 3.1.
Assume Al to A6 and Al >
n.
Then there exists
E
> 0 such
that for i = l, .. "k
0)
(')
(
nYX 1 ---8 1 (1.-2T)A
n+l
1
Proof.
-e
lIn
Ao+T-l -'(1')
~(y,2T) + n-~ L\
U
(k/n) 1 e·
k=l
n
I
By AI, A3, and N2
o(y,~-T)
+ O(n
-E
).
7
libn II = 0 Cn- no)
C3.7)
... Ci)
Fix i < k and let V
)".-2
vn = OCn
1
= X
n
n
•
).,
.
1
),
).,.
-1+).,.
1 b Ci)
Cn-1) 1 Vn + vn Vn - n
n Vn
-1+)".
1 CnT-Ci)
+ n
+ n -2T S~i)) •
e
n
).,.
C3.8)
.
S1nce n
for all 0 < y •
n
1
Vn+l
=
Iterating C3.8) yields
n
).,0
C3.9)
n
+
1
n~
L
Vn + l
=
V2 +
I
j=2
-l+).,o+T
CO)
1
.- 1
J
e.
+
0
J
j=2
Cv
0
l)V
n
0
J
J
J
-1+).,. -2T
n
I
j=2
-1+).,0
_ b~i)
.- Ci)
1
o
J
Sj
.
).,. -2
Since ).,0 > y, it follows from C3.7), lemma 3.1, and v
1
=
n
OCn 1
) that we can
choose a satisfying 0 < a < 2C).,1 - y) and
n
-).,.+y
1
Cv
n
_n
-1+).,0 CO)
1 b 1)v
n
n
=
1
OCn- -a) .
Therefore,
00
I
n=l
Ilv
n
- n
-1+)". (")
-).,. +Y+('J./2
1 b l)v n 1
I<
n
n
00
•
Since -).,.1 + y + y/2 < 0, Kronecker's lemma implies that
C3.10)
-).,·+Y n
-1+).,. CO)
/2
n 1
I ICv. - j
1 b0 1 )v.1 :: OCn- a ) .
j=l
J
J
J
Using A6 and an argument similar to the one which established C3.10), we obtain
-).,·+Y
n 1
n
I
j
-l+).,o-y
Ci)
1
CBo
_
j=2
Furthermore, it is easy to show that
J
SCO)
1 )
=
I
OCn""P 2) •
e·
8
n
-A.+y
n
L
1
-l+Ao-Y
j
1
j:::2
= (Ao_y)-l
1
... (1)
By Nl, if Y > 2T, then with probability one So
J
large.
+ O(n
-Ao+y
1
)
= 0 for all j sufficiently
Thus,
n
(3.11)
-Ao+y n
1
\
l.
j=l
JO
-1+Ai -2T -(10)
Q
IJJo
By (3.9) to (3.11), for some
0)
= -B( 1
O(2T,y) + O(n
-Ao+y
1
)
> 0,
E
n
(3.12)
L
j=l
If Y
F ~T,
then by Nl,y =
00
L
(J.
~
- T -
-l+Ao+T
1
~
)(J.
for
~
>
o.
-Ao+y+~/2
1
)
Thus by A4
(.)
-e.J 1 converges.
j=l
Therefore, by Kronecker's lemma,
-l+A.+T
j
1
e~i) = 0(1).
J
Thus we have shown that for some
(3.13)
y-A.
n
1
E
> 0,
n
o-l+A.1 +T _ (")
1
J
eo
=
L
J
j=l
n
-l+Ao+T
~-T-A.
_(i)
.
1
1
n
eo
o(y,
J
I
J
j=l
~-T)
+ O(n- E ).
Substituting (3.13) into (3.12) completes the proof.
4.
o
The one-dimensional ruM and KW processes.
Theorem 3.1 enables us to use theorems for sums of dependent random vari-"
abIes to prove theorems for stochastic approximation processes with dependent
noise.
.~
In this section we apply the work of Phillip and Stout (1975) to the
RM and KW processes in Rl .
Their monograph gives sufficient conditions so that
a sequence of random variables en can be redefined on a richer probability space,
9
without changing its distribution, together with a Brownian motion B(t) on
[0,00) such that for some E > 0,
L
(4.1)
k<t
that
e
k
=
B(t) + O(t~-€) .
For example, suppose ek is a strictly stationary ~-mixing process such
(~(n))~ < 00. (See Philipp and Stout, page 26, for the definition of
L:=1
~-mixing).
If Eel = 0 and E Iell
2+0
exists (Philipp and Stout, page 26).
tn
2
< 00 for some 0 > 0" then lim n -1 E([.k=l
ek )
n-+<x>
Call this limit
rl.
Then without loss of generality (12 = 1 can be assumed.
4.1, (4.1) holds.
Suppose that (12 >
o.
Then by their theorem
We will be interested in the asymptotic behavior of
Lk<t kae k where ek satisfies (4.1), so the following lemma is useful.
Lemma 4.1.
Let e be a sequence of random variables.
n
For any number a,
define S on [0,00) by
a
S (t) = L ka e k .
a
k<t
Suppose there exists a standard Brownian motion BO(t) on [0,00) and a positive
nlBnber
E
such that
Then for a <
lim S (t) exists and is finite.
(4.2)
t-+<x>
and for a >
C I
-~,
a
there exists a standard Brownian motion B and a positive number
a
such that
(4.3)
e·
10
~
Proof.
If a > -~, define N(k,a) = L~=l j2
Ba(O) = 0 and for N(k-l,a) < t
~
a
and then define B by
a
N(k,a), k = 1,2, ... , let
B (t) = B (N(k a)) • ka (B ((t-N(k-l,a)) - (k-l)) - BO(k-I)) .
a
a'
a
k2a
(4.4)
Since B is a standard Brownian motion so also is B .
O
a
Let a < 0 and let n be a positive integer. Define b~ = ka - (k+l)a for
a
k = l, ... ,n-l and bn = n.
Then by (4.3)
n
n
n
n
S (n) = L (L b~)ek = L b~
e )
a
k=l j=k J
j=l J k=l k
n
n
1
= L b~ Bo (j) • L b.n J.~-€ , and
j=l J
j=l J
(f
(4.5)
=
(4.6)
n
n
k=l
j=k
L (L
b~)(BO(k) - BO(k-I))
J
n
= L
ka(Bo(k) - BO(k-l)).
k=l
n
lim
(4.7)
n~
L
j=l
b~ j~-E exists and is finite if a
<
-~
•
J
Since (BO(k) - BO(k-I)) is an independent sequence of standard normal random
variables, (4.6) implies that
(4.8)
exists and is finite, if
lim
a <
-~
n~
By (4.5), (4.7), and (4.8), if a <
-~
then (4.2) holds.
If a >
-~,
1
n-l
\L J.~-€ bn. = O( \L J.-~-a-€).. n~-E-a = O(na+~-~ ) for some
j=l
J
j=l
then
n
(4.9)
·e
For
-~
(4.10)
< a < 0, it follows from (4.5), (4.6), and (4.9) that
S (n) = B (N(n,a)). O(na+~-~), n = 1,2, .•..
a
a
~
> 0 .
•
11
~ ~
Now it will be shown that (4.10) holds for
c
k
= k~ - (k-l)~ for k > 2.
S (n)
~
=
O.
Define c l = 1 and
Then for ~ > 0
k
n
I (Ij=l
=
cJ.)e k
k=l
. 1) + O(n~-E)),
Bo( Jn
I
j=l
n
=
L
k~(BO(k) - BO(k-l)), and
I
l.::-E
c. n 2
= O(n~+~-E)
k=l
n
j=l
Therefore (4.10) holds for
~
J
> O.
One can easily show that for
L
(4.11)
For
~
>
~
.
= 0, (4.10) holds by assumption.
-~,
there exists
~
> 0 such that
j2~ = (2~ + 1)-1 t2~+1 + 0(t2~+1-~) .
j~t
In the proof of their lemma 3.5.3, Philipp and Stout (1975) show that if
1>
°>0 and Bis a
Brownian motion on [0,00) then for each II > 0
B(t + 0(t 1- o)) = B(t) + 0(t~-O/2+ll)
(4.12)
By (4.10) to (4.12), (4.3) holds for
k = l, .•. ,n and n
n
~
o
-~.
1, be positive numbers.
n suppose bk ~ bk+1 .
,n
n
lim lk=l ~bk = O.
n-+oo
Proof.
>
Suppose a k are real numbers and I:=l ~ converges.
Lemma 4.2.
n
~
n
Suppose sup b < M < 00, and for each
n nn
Assume that for eachk, b decreases to 0 as n -+ 00. Then
k
Fix E > O.
Then choose N' ~
.
n
n
def1ne ck = b k
Let b~,
Choose N such that if n > m > N, then IL~=m akl ~ E •
n
N such that b k ~ E for
n
n
b _ if k > 2 and c =
k 1
1
k = 1, ... ,N and n > N'.
n
b . Then if n > N,
1
For k < n,
12
k
n
n
L ~(Lj=d
L ~ b~1 = Ik~N
I
k~N
n
<
Therefore if n
~
IL
j=l
c~
J
n
(
J
~) I < b
L
k=max(j,N)
n
-
<
E
n
ME
•
N',
N
n
c~) I
n
I L ~ b~1 2 I L a k b~1 + I L ~ b~1
k=l
k=l
k=N+I
N
<
(L I~ IlE
o
+ ME •
k=l
Now we treat the KW process, and for this we need several assumptions.
A7.
l
l
M is a function from R to R .
The number a is the unique solution of
inf M(x) = M(a)
and
x
M'(a) = 0 .
Hencefore we assume also that a = o.
AB.
M has two continuous derivatives, W" exists in a neighborhood of 0,
d
d > 0, W"ex) = M"'(O) + oelxl ) as x-+O, s~p IM"(x) \ <00, {x: IM'(x)\ < o}
is compact for some 0 > 0, and {x: M(x)
A9.
AlO.
Let a and c be positive.
2
C} is compact for all C > MeO).
Suppose aw'(O) > 1/3.
Let u be a random variable.
Define Set) =
L
~.
k<t K
Bet) is a standard Brownian motion on [0,(0), and for some 0 > 0
n
Set) =
All.
0
B(t) + O(t~-o) .
Let Xl be a random variable and define X recursively by
n
-1
Xn+l = Xn - anY n where
-e
Suppose
0
2
> 0,
13
Theorem 4.1.
B
Suppose A7 to All hOld.
c 2 M"'(0)/(6a M"(O) - 2).
=a
Then for some E
t
= B + n -~
(4.13)
Define X(t), t
~
0 by X(t)
Let A = WW'(O) - (5/6) and
n
~
~
k=l
= n l / 3 Xn if n
>
0,
(k/n) A e k + O(n -E ) .
< t < n + 1.
Then there exists a
standard Brownian motion Z(t) on [0,00) and E > 0 such that
(4.14)
Proof.
We first note that AS holds by lemma 1 of Ljung (1978).
To apply
this lemma his X(n), e(n), B(n), y(n), and f(x) are set equal to our Xn+ I' -un •
M'(X) - (M(X +cn- l / 6) - M(X _cn- l / 6)). a/(2cn 5/ 6) and -M'(x), respectively.
n
n
n
Ljung's condition Bl is verified by using (4.2) of lemma 4.1, lemma 4.2 and his
equation (15).
After using his lemma 3 to verify his condition B2, it is clear
~
that all conditions of his lemma 1 are satisfied.
A4 holds because of AlO and (4.2).
with f(x)
=a
= Al = a
= 1/3 = 2T
M'(x). 0
T = 1/6 (so that y
We now show that AI, A2, and A3 hold
2
W'(O), B = (a c /6) W"(O), en = a un/(2c) and
= 1/2 - T).
By Taylor expansions,
M(X + cn- l / 6) _ M(X - c -1/6)/2 cn- l / 6 = M'(X) + 1/12 c 2 n- l / 3 (W"(n) + M"'(p))
n
n
n
n
n
n
where I~I < IXnl and Ipn l < IXnl·
+ W"
Therefore if we set Bn
= (ac 2/l2) (W"(nn )
(Pn)),
a Yn = f(X n ) + n- 2T IJQ + n T en
n
and Al holds.
f(x)
A2 follows from A8.
= Ox + O(x 2 ) as x
+
O.
A3 holds because A8 implies that
Thus we have shown that Al to AS hold.
By lemma
3.1, X = O(n-E) for an E > 0; this and A8 imply that Bn = B + o(IXnld)
n
= B + O(n -Ed ) and so A6 holds. We now invoke theorem 3.1 to prove that (4.13)
e·
14
~
holds.
By A.lO
l
e
k<t
= aal(2c) B(t) + O(t:l!.:-6 ) •
k
Therefore by lemma 4.1, there exists a Brownian motion Z(t) and an € > 0 such
that
L
kA e
k<t
k
= aa (2c)~1 (2A+l)-~ Z(t 2A+ l )
+
O(tA+~-€).
o
This and (4.13) imply (4.14) .
Theorem 4.1 yields results on the asymptotic flucuation behavior of Xn .
Here is a simple example.
Corollary 4.1.
Proof.
Suppose A7 to All hold. Then
l/3 (X -B)
n
n
lim sup
= - - - - -aa- ; - 1
n~
12 log(log n) (2c)(2A+l)~
Straightforward.
Use the law of the iterated logarithm for
o
Brownian motion.
Now we state, without proof, an analogue of theorem 4.1 for the RM process.
A13.
Assume f is a function from Rl to Rl and the 0 is the unique solution
of
f(x)
A14.
= O.
Let un be a sequence of random variables.
Brownian on [0,00), € > 0, a > 0, and
l
·e
k<t
AIS.
X
n+l
= Xn
a n-
~
=a
l (f(X )
n
B(t) + O(t~-€)
+
u ) .
n
Suppose B(t) is a standard
15
A16.
for
E
f has a continuous derivative and f(x)
= f'(O)x
+
O(lxl l + E ) as x
~
0
4It
> O.
A17.
Define Vex)
= J~ f(y)dy. Suppose {x: Vex) ~ C} is compact for all
C < sup Vex).
A18.
Suppose sUPlf'(x)
I
<
00
and {x: If(x)
I ~~}
is compact for some ~ > O.
Theorem 4.2. Suppose A13 to A19 hold and a f'(O)
Then for some E > O.
Define X(t), t _> 0 by X(t)
Brownian Z(t) and an
Proof.
E
1
= n~
X on n < t < n
n
> 0 such that
+
1.
>~.
Define D = ah'(O)-l.
Then there exists a standard
o
Similar to the proof of theorem 4.1.
References
Blum, J.R. (1954).
Approximation methods which converge with probability one.
Ann. Math. Statist. 25 382-386.
Chung, K.L. (1954).
On a stochastic approximation method.
Ann. Math. Statist.
25 463-483.
Fabian, V. (1968).
On asymptotic normality in stochastic approximation.
Math. Statist. 39
1327-1332.
Gaposkin, V.F. and Krasu1ina, T.P. (1974).
in stochastic approximation processes.
Kersting, Gotz (1977).
Ann.
On the law of the iterated logarithm
Theor. Probability Appl. 19 844-850.
Almost sure approximation of the Robbins-Monro process by
sums of independent random variables.
Kiefer, J. and Wo1fowitz, J. (1952).
regression function.
Ann. Probability 5 954-965.
Stochastic estimation of the maximum of a
Ann. Math. Statist. 23 462-466.
e·
16
Kushner, H.J. (1977).
General convergence results for stochastic approximations
via weak convergence theory.
J. Math. AnaZ. and AppZia. 61 490-503.
Kushner, H.J. and Clark, D.S. (1978).
Stochastic Approximation Methods for
Constrained and Unconstrained Systems.
Ljung, L. (1978).
Springer-Verlag, New York.
Strong convergence of a stochastic approximation algorithm.
Ann. Statist. 6 680-696.
A law of iterated logarithm for the Robbins-Monro method.
Major, P. (1973).
Studia Sai. Math. Hungar. 8 92-102.
McLeish, D.L. (1975).
Ann.
A maximal inequality and dependent strong laws.
ProbabiZity 3 829-839.
McLeish, D.L. (1976).
Functional and random central limit theorems for the
Robbins-Monro process.
J. AppZ. ProbabiZity 13 148-154.
Nevel'son, M.B. and Has'minskii, R.Z. (1976).
Recursive Estimation.
Stochastic Approximation and
Trans. of Math. Monographs Vol. 47, Amer. Math.
Soc., Providence, Rhode Island.
Philipp, W. and Stout, W. (1975).
Almost sure invariance principles for partial
sums of weakly dependent random variables.
Amer. Math. Soc. Mem. No. 161.
Amer. Math. Soc., Providence, Rhode Island.
Robbins, H. and Monro, S. (1951).
A stochastic approximation method.
Ann. Math.
Statist. 22 400-407.
Sacks, J. (1958).
Asymptotic distribution of stochastic approximation procedures.
Ann. Math. Statist. 29
Walk, H. (1977).
373-405.
An invariance principle for. the Robbins-Monro process in a
Hilbert space.
Wasan, M.T. (1969).
Z. WahrscheinZichkeitstheorie und Verw. Gebiete 39
Stochastic Approximation.
135-150.
Cambridge Univ. Press, New York.