ON SEQUENTIAL ESTIMATION OF THE RENEWAL FUNCTION, OPTIMAL
BLOCK REPLACEMENT POLICIES, AND FIXED WIDTH CONFIDENCE BANDS
by
John Crowell
A Dissertation submitted to the faculty of The University of North Carolina at Chapel Hill in partial
fulfillment of the requirements for the degree of Doctor of
Philosophy in the Department of Statistics.
Chapel Hill
1990
Reader
Reader
JOHN CROWELL.
On Sequential Estimation of the Renewal Function, Optimal Block
Replacement Policies, and Fixed Width Confidence Bands.
(Under the direction of
PRANAB K. SEN.)
ABSTRACT
Stochastic approximation methodology is applied to the problem of estimating an
optimal block replacement policy.
Several improvements to minimize the asymptotic
mean square error of the estimator are considered. A stopping time is then defined with
the objective of a fixed-width confidence interval for the optimal policy.
A nonparametric estimator is used for sequential estimation of the renewal
function at a single point and over a bounded interval of real numbers. Several strongly
consistent estimates of the asymptotic variance are given.
Sequential fixed-width
confidence interval methods are shown to be effective for estimating the renewal function
at a point. A stopping time is defined for minimum (integrated) risk estimation over an
interval.
It is shown that bootstrap methods can be used to construct asymptotically
correct confidence bands for the renewal function over an interval.
Let R be a statistical function which depends on a distribution F and is defined on
an interval of real numbers. Suppose R may be estimated using a random sample from F.
Motivated by the problem of a fixed-width confidence band for R, a stopping time is
defined that is analogous to those available for fixed-width confidence interval estimation
of a real-valued parameter. The proposed stopping time can be used in conjunction with
a bootstrap confidence band procedure. Sufficient conditions are given for the sequential
confidence band to have the correct asymptotic coverage probability as the desired band
width goes to zero. Several properties of the stopping time, such as asymptotic efficiency,
are addressed with reference to bootstrap procedures.
confidence bands are also considered.
- ii -
Stopping times for variable width
ACKNOWLEDGEMENTS
First and foremost, thanks go to my adviser P. K. Sen for his guidance in the
preparation of this dissertation. I hope that if I ever direct a student I will be as helpful
as he was.
I would also like to express my appreciation to the members of my committee
( I. M. Chakravarti, M. R. Leadbetter, Gordon Simons, and Walter L. Smith) for their
assistance, as well as to the entire faculty of the Department of Statistics for their
encouragement during my studies.
Thanks also go to the staff, in particular June Maxwell for so many different
things and Lee Trimble for the valiant typing of parts of this dissertation.
The constant support of my parents and family are something for which I am
always very grateful.
Final thanks go to my fellow students for their support and friendship, especially
the one who did not want to be mentioned and my officemate of four years, Prakash
PatH.
TABLE OF CONTENTS
Page
LIST OF FIGURES...................................................................................................
LIST OF ABBREVIATIONS
v
VI
LIST OF SYMBOLS
VII
Chapter
I. INTRODUCTION
1
1.1 Introduction
1.2 Three Separate Problems................................................................................
1
1
1.2.1 An optimization problem in reliability theory........................................
1.2.2 Problem: sequential estimation of the renewal function
1.2.3 The question of sample size for fixed-width confidence bands
2
3
4
1.3
~.......................................................
Literature Review
5
1.3.1 Stochastic approximation
1.3.2 Estimating an optimal age replacement policy......................................
1.3.3 Estimation of the renewal function........................................................
5
6
9
1.4 Summary.........................................................................................................
11
1.4.1 Chapters 2 and 3..................................................................................
1.4.2 Chapters 4 and 5..................................................................................
1.4.3 Chapter 6
11
12
13
II. ESTIMATING AN OPTIMAL BLOCK REPLACEMENT POLICy...................
14
2.1 Introduction
2.2 The Basic Algorithm
2.3 Proofs..........................................................................................................
2.4 Random Block Replacement Policies
2.5 A Critical Assumption
14
15
18
33
35
III. REFINEMENTS AND A FULLY SEQUENTIAL PROCEDURE
3.1 Introduction
3.2 Improving The Rate Of Convergence..............................................................
3.3 The Optimal Tuning Constants......................................................................
3.4 An Adaptive Stochastic Approximation Algorithm
3.5 A Fixed-Width Confidence Interval For The Optimal Block Replacement Policy
IV. RENEWAL FUNCTION ESTIMATION AT A POINT
4.1
4.2
Introduction
A Correction And Some Generalizations
- iii -
41
41
42
56
59
70
79
79
81
4.3
4.4
An Invariance Principle
Estimating The Variance
.
.
4.4.1 Frees' variance estimator
4.4.2 A jackknife estimator of variance
4.5
Fixed-Width Confidence Intervals
. 96
.. 101
. 109
V. RENEWAL FUNCTION ESTIMATION ON AN INTERVAL
5.1
5.2
5.3
85
96
. 112
Introduction
Minimum Risk Estimation
An Invariance Principle For Interval Estimation
. 112
. 113
.. 118
5.3.1 Asymptotic normality of finite-dimensional distributions
5.3.2 Tightness of the process
.. 119
. 127
A Sequential Bootstrap Confidence Band
A Variable-Width Bootstrap Confidence Band
..
..
131
138
VI. STOPPING TIMES FOR CONFIDENCE BANDS
..
143
Introduction
Asymptotic Coverage Of The Confidence Band
Properties Of The Stopping Time
Bootstrap Quantile Estimators
Variable Width Confidence Bands
Example: Empirical Processes
.
..
..
..
.
.
143
147
152
154
162
164
5.4
5.5
6.1
6.2
6.3
6.4
6.5
6.6
APPENDIX
. 167
BIBLIOGRAPHY
. 177
- IV -
e
LIST OF FIGURES
Page
Figure 2.1(a): Gamma(1,2)
38
~........................................
Figure 2.1(b): Gamma(1,4)
Figure 2.1(c): Gamma(1,6)
39
40
- v -
LIST OF ABBREVIATIONS
ARP
age replacement policy
a.s.
almost surely
BRP
block replacement policy
LLd.
independent and identically distributed
SA
stochastic approximation
u.c.i.p.
uniformly continuous in probability
- vi -
LIST OF SYMBOLS
X,I
D
the indicator function of a set or proposition
-+
weak convergence (convergence in distribution)
w
-+
weak convergence (convergence in distribution)
equal in distribution
aJ\b
minimum of the numbers a and b
avb
maximum of the numbers a and b
difference operator for sequences:
[ .]
~an
= a n +l
greatest integer function
conditional expectation given the IT-field GJ
- vii -
-
an
CHAPTER 1
INTRODUCTION
1.1
Introduction
This thesis addresses three separate problems which are introduced and motivated
In
Section 1.2.
A unifying theme is that all three problems are approached from a
sequential point of view. An additional link between two of these problems is that both
involve the renewal function.
Section 1.3.
Some background and literature review are given in
The final section of this chapter contains a summary of the remaining
chapters.
1.2 Three Separate Problems
1.2.1 An optimization problem in reliability theory
An important topic in reliability theory is maintenance policies for systems
(mechanical, electrical, etc.) whose components are subject to stochastic failure. Often a
primary consideration in maintaining such a system is the cost associated with failure of
the system, above and beyond the actual cost of subsequent repair. For instance, if the
system is some manufacturing process, then there is the productivity lost between the
time of system breakdown and the restoration of production.
A partial remedy for this
may be to carry out a preventive maintenance plan where aging parts are repaired or
replaced according to some schedule.
While such a program may diminish the number
of system failures or system downtime, it will increase routine maintenance costs since
some parts will be replaced or attended to before failure.
Ideally, one would like to
minimize costs by balancing these two factors.
More specifically, consider a system consisting of a single component or part.
Assume that this part can be replaced by a like part at any time and
following failure.
IS
replaced
Suppose all parts used have independent and identically distributed
lifetimes with distribution
F.
Further assume that the cost
C 1 of having to
unexpectedly replace a failed part is greater than the cost C 2 of making a replacemen t
at a planned time.
If F has increasing failure rate, then it may be less expensive to
substitute new parts for aging parts near failure than to just make replacements
following failure.
Two maintenance plans motivated by this consideration are the block
replacement policy (BRP) and the age replacement policy (ARP). Both of these policies
are treated in detail by Barlow and Proschan (1965, 1975), where some historical
development is also given.
We consider the most basic BRP here.
The ARP is
described in Section 1.3.2.
Suppose a new part is installed at time O.
In the simplest BRP, planned
replacements are made at times T, 2T, 3T, ... where T is a positive constant;
replacements are also made following any failures.
made instantaneously.
It is assumed all replacements are
For this BRP, the expected long-run cost per unit time as a
function of T is
B(T) = C 1 H(T)
T
+ C2
where H is the renewal function associated with the lifetime distribution F. (See Section
1.2.2.)
Thus H(T) is the expected number of failures for each of the intervals (0, TJ,
( T, 2T J, etc.
-2-
Suppose there exists a finite positive value ¢* which minimizes B(T) in T; ¢* is
referred to as the optimal BRP since T = ¢* results in the minimum possible expected
cost per unit time. If the lifetime distribution F or associated renewal function H were
known, then finding ¢* would be a deterministic problem. Here both F and H are taken
to be unknown and non parametric estimation of ¢* is undertaken.
Frees(1983) applied
stochastic approximation methodology to the problem of sequential estimation of an
optimal ARP.
(See Section 1.3.2.)
In Chapters 2 and 3 we consider using similar
techniques for sequential estimation of the optimal BRP ¢*.
1.2.2 Problem: sequential estimation of the renewal function
Let Xl' X 2 ,... be independent and identically distributed (Li.d.) nonnegative
random
variables
with
distribution
function
F
and
Sn = Xl +X 2 +... + X n , the time of the n-th renewal.
positive
mean
Ill'
Let
Denote by N(t) the largest n
such that Sn ~t. If Xl >t, then by convention N(t)=O. The renewal function is then
H(t)=EN(t), the expected number of renewals up to and including time t. Noting that
N(t)= Ek=l X(Sk~t), it follows that
H(t) =
(1.1 )
f:
F(k)(t),
k=l
where F(k) is the k-th convolution of F.
The renewal function is important in a number of areas of application. In Section
1.2.1 it was seen that the cost of a BRP depends on the renewal function associated
with F. More generally, the renewal function is useful in reliability for characterizing the
expected number of parts necessary to keep a system in repair.
Proschan (1965).)
(See Barlow and
Business applications include warranty analysis and inventory
policies.
-3-
For large values of t the behaviour of the renewal function
IS
well understood.
')
For instance, assuming F is nonarithmetic and has finite second moment J1.2 = E Xi,
Smith (1954) showed that
H(t) _..t.
1-'1
(1.2)
-+
1-'2 -1
2jJI
as t
-+ 00.
This result provides a means for estimating H(t) for large values of t from the i.i.d.
sample Xl' X 2 ,· .. , X n by first estimating 1-'1 and 1-'2' (See Section 1.3.3.)
Motivated by (1.1), Frees (1986i, ii) suggested a nonparametric estimator of the
renewal function useful for applications such as those previously mentioned where it may
be useful to have knowledge of H(t) for small and moderate values of t. This estimator
is also applicable when estimating H(t) for large values of t when F does not have a
finite second moment.
Chapter 4 addresses sequential estimation of H(t) at a single
point t using Frees' estimator.
the interval [0, T], 0< T <
00,
A sequential approach to estimating the function H on
is taken in Chapter 5.
1.2.3 The question of sample size for fixed-width confidence bands
In many applications one would like to estimate a statistical function R F , where
R F depends on a distribution function F.
For instance, there might be interest in
estimating the renewal function on the interval [0, T] as discussed in Section 1.2.2.
Other examples in reliability theory include estimation of the mean and median residual
life functions for a lifetime distribution F. The function R F could be the distribution F
or the quantile function associated with F. When estimating such functions, one might
want to construct a confidence band of a prescribed width which contains the function
of interest with high probability.
-4-
e
Assume that given a random sample Xl' X 2 , "., X n from the distribution F
there exists a natural estimator R n (·) = Rn(·j Xl' X 2 , ... , X n ) of R F ,
Suppose the
stochastic process
is such that rn
~ OF'
where
OF
is a Gaussian process,
A number of authors have
exploited weak convergence of rn to construct confidence bands for a variety of choices
of R F , typically by bootstrapping or simulating the process rn. Usually it is not clear a
priori what sample size n is sufficient to achieve approximately the desired level of
precision and confidence.
This suggests approaching the problem from a sequential
point of view, which is done in Chapter 6.
1.3 Literature Review
1.3.1 Stochastic approximation
Let M(x) be a regression function on the real line. Suppose that for any x we may
observe the random variable Y(x), an unbiased estimate of M(x).
Robbins and Monro
(1951) introduced stochastic approximation (SA) as a technique for finding the
(assumed) unique root fJ of the equation M(x) =
Q'
=0.)
Q'.
(Without loss of generality we take
Let {an} be a decreasing sequence of positive constants such that lim an
n-+oo
= O.
Given Xl' X 2 , ... , X n , successive estimates of fJ, let Y n be conditionally unbiased for
M(X n ). Robbins and Monro (R-M) proposed the algorithm X n + 1 = X n - an Y n, where
X n + 1 is the next estimate of fJ. Assuming that the an are of type l/n and the random
variable Y(x) is a.s. bounded for any x, they provided several sets of sufficient conditions
on M for weak convergence of X n to 8.
-5-
Kiefer and Wolfowitz (1952) suggested a similar procedure for finding the
(assumed) unique
f}
such that the regression function M(x) is maximized at x
= B.
Let
{ an} be as before and let {cn} be a second sequence of positive constants decreasing to
zero.
Suppose Y2n and Y 2n - 1 are conditionally unbiased for M(Xn+cn) and
M(Xn-cn), respectively. The Kiefer-Wolfowitz (K-W) algorithm is then of the form
(1.3)
Assuming that M(x) is strictly increasing for x < f} and strictly decreasing for x> B, it
was shown under additional conditions that X n is a weakly consistent estimator of O.
We mention some of the important developments in SA since the introduction of the RM and K-W procedures.
Blum (1954i) was the first to use martingale theory to prove
a.s. convergence in both settings.
multivariate SA procedures.
In a later paper, Blum (1954ii) proposed several
A useful theorem (Robbins and Siegmund (1971)) fOl·
showing a.s. convergence of SA-type procedures is stated and employed in Section 2.2.
Chung (1954) used a "convergence of moments" proof to show asymptotic
normality of
a~/2( X n -f})
for the R-M algorithm.
The same technique was llsed by
Derman (1956) to establish a limiting normal distribution in the K-W case.
Central
limit theorems for dependent random variables were the method of Sacks (1958) in
weakening the conditions of Chung and Derman.
In the same paper is a result on
asymptotic normality for multivariate procedures.
Fabian (1968) provides a theorem
which generalizes the earlier work on asymptotic normality discussed here.
This
theorem is stated in a univariate form in Section 2.2.
Kersting (1977) demonstrated that the R-M process could be approximated by a
weighted sum of i.i.d. random variables. A law of the iterated logarithm and in variance
principle followed from this approximation.
Ruppert (1982) derived similar properties
-6-
e-
for a multivariate algorithm which includes both the R-M and K- W cases.
A random
central limit theorem for a similar process is developed by Frees (1983).
A short
summary of other works on approximation of SA processes is contained
Ruppert
ID
(1982).
Most applications of SA require stopping the algorithm at some point. For the RM and K-W procedures, a practitioner might want a fixed-width confidence interval of
length 2d, for some d
> 0, that covers 0 with some prespecified probability.
(1973) provides stopping times N d
"
and T d
Sielkin
for the R-M case such that for any
"
, E (0, .5)
lim P {
d-+O
I XT
d"
+1 - 0
I~
d } = 1 - 2,
and
lim P {
d-+O
I M(X N
+1) - M(O)
d"
I~
d } = 1 - 2,.
McLeish (1976) arrives at a similar stopping time for the R-M procedure USlDg a
functional central limit theorem for an appropriately defined process.
Frees (1983)
considers a stopping rule for a more general process, a rule which is described and then
used in Section 3.5 of this work.
1.3.2 Estimating an optimal age replacement policy
Consider the maintenance problem described in Section 1.2.1 with the same
assumptions regarding the costs of planned and failure replacements, supposing all
replacements are made instantaneously.
Under an ARP, if a part has not failed and
been replaced prior to reaching some fixed age T, it is preventively replaced at age T.
The expected long-run cost per unit time can be shown to be
- 7-
A(T) =
e-
C 1 F(T) + C 2 S(T)
JT
'
S(t)dt
o
where F is the lifetime distribution of the parts and S(t) = 1 - F( t).
Frees (1983) applies SA to the problem of estimating the optimal age replacement
policy when the distribution F is unknown.
We summarize some of his results.
It is
easily checked that
.!!.A(t) =
t M(t)
dt
{ J OS(u)du }2
where
Suppose that ¢ uniquely minimizes A(t) in t, M(¢)=O, and M(t)(t-4»>O for t=l-4>.
(These are typical assumptions on M for identifiability of ¢.) It then seems natural to
try SA as a method for estimating ¢.
Let {Xin}~=l' i = 1,2, be sequences of i.i.d. random variables with distribution
F such that {X 1n } and {X 2n } are independent.
Like the K-W method, Frees'
procedure requires two sequences of nonnegative real constants, {an} and {cn}.
Letting Z.In = min (X.In , 4>n+cn), F.In (t) = X(
z.In < t), and
S.In (t) = 1 - F.In (t), define an
estimate of M(t) by
Here fn(t) is an estimate of f(t). Then given ¢1' the formula
-8-
e
defines a sequence {<Pn} of estimates of <p.
(Frees actually considers an algorithm for
finding the zero of M(g(t», where g: R .... (0, 00) is a known strictly increasing
continuous function.
This ensures that all estimates of <P are positive.
Our
simplification suffices for this discussion.)
Frees first establishes that if
then under suitable conditions <Pn .... <P a.s. and <Pn, properly standardized, has a limiting
normal distribution.
It is then shown that by· using a more refined kernel density
estimator, the rate of convergence of the estimate <Pn can be improved. The asymptotic
mean square error can be minimized by proper choice of the sequences {an} and {c n }.
An "adaptive" procedure is defined which estimates the optimal sequences as the
algorithm proceeds.
attained.
With this procedure the minimal asymptotic mean square error is
A sequential fixed-width confidence interval scheme is then defined which
provides the correct coverage probability asymptotically as the desired confidence
interval width goes to zero.
1.3.3 Estimation of the renewal function
Frees (1986i) considers several estimators of H(t) with reference to an application
in warranty analysis.
Let Xl,,,.,X n be LLd. random variables with distribution F
which has positive mean III and finite variance (12
In practice, observations could be
extracted from a realization of the renewal process with underlying distribution F.
The first estimator assumes F
IS
non-arithmetic and depends heavily on the
-9-
limiting form of the renewal function as described by (1.2):
Here Pn and
O'n
are estimates of P and
0',
e·
respectively. In general, this estimator is not
consistent for finite t; however, Frees concludes it may perform well for large t.
Another
approach
is
based
on
the
(1.1).
expression
Assume
F( x; al''''' ap) is known up to a finite number of parameters aI"'"
ap'
F(x) =
Suppose
a ln , ... , apn are consistent estimates of a I , ... , ap' A natural estimate of H( t) is then
where
F~k) is
the k-fold convolution of Fn(x) = F(x; a ln , ... , a pn ). Frees proves that
if F is absolutely continuous (discrete) and the density (probability mass function) f is
e
continuous in aI'"'' ap, then H 2n (t) ..... H(t) with probability one.
A third estimator is also suggested by (1.1).
Let XiI'"'' X ik represent any
subset of k observations and E C denote the summation over all
(k)
subsets of k
observations. An unbiased estimate of the k-fold convolution of F is the U-statistic
(k)
F n (t)
1 '"
= (k)
~ X( XiI +",+X ik
<
t ).
An estimate of the renewal function is then
where m
=men) $ nand
m ..... oo as n ..... oo. This estimator is strongly consistent assuming
- 10 -
e
some mild conditions on F and m (Frees (1986ii».
Frees draws some conclusions about these estimators from their theoretical
properties and their performance on some data and in a Monte Carlo study,
By
definition, Hln(t) is appropriate only for t large relative to Ill' If F is known up to a
finite number of parameters, H 2n (t) should perform well for both small and large t.
However, problems of robustness may arise if F does not fully meet the assumptions
made. Finally, H 3n (t) would be the preferred estimator for small t when no parametric
assumptions can be made.
The statistical properties of H 3n (t) are further investigated in Frees( 1986ii),
reviewed here in Section 4.1.
H(t). If F~l)(t)
In this paper Frees discusses yet another estimator of
= F~l)(t) is the ordinary sample distribution function, let
be its k-fold convolution. (Each of these is a V-statistic.) Then define Hn(t) by
Frees asserts that under fairly weak conditions this estimate
IS
consistent and
asymptotically normal.
1.4 Summary
1.4.1 Chapters 2 and 3
Chapter 2 introduces a stochastic approximation algorithm for estimating an
. optimal block replacement policy.
Sufficient conditions are given for the procedure to
- 11 -
yield a strongly consistent sequence of estimates. A result on the asymptotic normality
of the estimator provides information about its rate of convergence. We discuss how the
procedure can be carried out concurrently with an actual maintenance program to
achieve real savings in cost while estimating the optimal policy.
The final section
describes a major drawback of the procedure, namely that it is typically impossible to
verify a critical assumption.
In Chapter 3 we undertake various improvements of the algorithm defined in
Chapter 2.
It is indicated how under certain circumstances one can significantly
improve the rate of convergence of the estimator.
Adaptive stochastic approximation
techniques are applied to develop a procedure which minimizes the asymptotic mean
square error. Finally, the procedure is made fully sequential by defining a stopping time
for fixed-width confidence interval estimation of an optimal BRP.
1.4.2 Chapters 4 and 5
In Chapter 4 concern lies with sequential estimation of the renewal function at a
point using the estimator considered by Frees (1986ii).
defined in Section 1.3.3.)
(This is the estimator H 3n
We first note how this estimator is applicable under slightly
more general conditions than given in Frees (1986ii).
A jackknife estimator of the
asymptotic variance as well as an estimator considered by Frees for the same quantity
are shown to be strongly consistent. These variance estimators are used in defining the
usual type of stopping time for a fixed-width confidence interval;
with the help of an
appropriate invariance principle, developed in Section 4.3, the corresponding fixed-width
confidence interval can be shown to have the correct asymptotic coverage probability as
the interval width goes to zero.
- 12 -
e-
In
Chapter
5
we
move on
to
consider estimating the
renewal
function
simultaneously at all points t in the interval [0, T], again with the estimator considered
by Frees.
First, a stopping time for minimum risk estimation is considered, where the
risk function is composed of an integrated mean square error term and a term which
accounts for the cost of the sample.
It is seen that such a stopping time can provide
guidance in balancing the cost of error with the cost of the sample.
Next, the
behaviour of the function estimator is more carefully characterized in an
1I1 vanance
principle.
Motivated
by this result, bootstrap methods are advocated
for
the
construction of both fixed and variable width confidence bands for the renewal function
on [0, T].
Stopping times for the construction of these two types of confidence bands
are then introduced, directly motivating Chapter 6.
1.4.3 Chapter 6
We consider stopping times for the construction of fixed-width confidence bands
to partially answer the problem posed in Section 1.1.3; the objective is to provide some
guidance in deciding when an adequate sample size has been attained for the desired
levels of confidence and precision.
The theoretical development is similar to that
available for fixed-width confidence interval estimation of a real-valued parameter.
Properties of the stopping time such as asymptotic efficiency and normality (as the band
width goes to zero) are explored.
We then consider some consequences of using the
stopping time in conjunction with a bootstrap procedure for constructing confidence
bands. That these same sequential methods can be applied in constructing a confidence
region for a statistical function R F defined on RP, p ~ 2, is shown by means of an
example.
Stopping times for variable width confidence bands, where the width of the
band varies with the argument t, are also considered.
- 13 -
e·
CHAPTER 2
ESTIMATING AN OPTIMAL BLOCK REPLACEMENT POLICY
2.1 Introduction
In this chapter stochastic approximation methodology is applied to the problem of
estimating an optimal block replacement policy.
Section 2.2 introduces a basic stochastic approximation algorithm which yields a
sequence of estimates of the optimal replacement interval.
Sufficient conditions are
given for consistency of the procedure; a theorem on the asymptotic normality of the
estimate provides information about the rate of convergence. Section 2.3 contains proofs
of these results.
Section 2.4 describes how the theoretical procedure of Section 2.2 relates to what
might be done in practice.
Successive estimates of the optimal time interval could be
used to carry out a random BRP, where at the time of each planned replacement the
current estimate is used to determine the time of the next planned replacement.
We
verify that if a consistent estimate of the optimal policy is available, then a random
BRP can be implemented such that the incurred cost per unit time converges to the
optimum.
Section
2.5
reveals an
important
drawback of the
procedure,
namely
its
dependence on an assumption that may not hold in practice or may be impossible to
verify.
e.
2.2 The Basic Algorithm
Recall that with notation previously defined in Section 1.2.1 the cost function for
the BRP is
B(t)
=
C 1H(t)+C 2
t
where H is the renewal function associated with F and C 1 > C 2 .
The objective is to
minimize B in t. Assume that F is absolutely continuous so that the renewal density h
= H(l) exists.
so that D(t)
D( 4>*) = O.
Define
= B(l)(t).
Suppose that B(t) has a unique minimum at t
= 4>*
with
Let {an} be a sequence of positive constants decreasing to zero. Given an
initial estimate
4>i of 4>*, the Kiefer- Wolfowitz algorithm (1.3) suggests defining
successive estimates of
4>* by the formula
-where D n (4)iD is an estimate of D(4);;). This procedure would have the disadvantage of
possibly generating negative estimates when necessarily
given that
4>* is greater than zero.
If it
IS
4>* is greater than some known 6> 0, then one might use the algorithm
Rather than constraining the estimates in this way, the following approach is taken.
Assume there exists a known 6,
0
< 6 < 4>*. Let g: R-+( 6, 00) be a known,
continuous, strictly increasing function having s+l bounded derivatives.
- 15 -
Suppose tf;
minimizes Bog(t) in t, so that ¢*= g(¢). Define
Since
Dg(t)=Bog(l)(t)
and
Dg(¢)=O,
e·
we
now
approximation procedure to estimate the zero of Dg.
consider
using
a
stochastic
Given an initial estimate ¢1'
define successive estimates of ¢ by
(2.1)
where Dgn(¢n) is an estimate of Dg(¢n)' Given ¢n, take g(¢n) as an estimate of ¢*.
The next step is to define an estimator Dg n (t) of Dg( t).
00
Let {X ij }i,j=1 be
independent and identically distributed random variables with distribution function F
defined on a probability space (n, GJ, P). Take N n = {Nn(t),
t~O}
to be the renewal
00
process associated with the sequence {X nj }j=l with renewals occurring at X n l'
X n1 +X n2 , X n1 +X n2 +X n3 , ... for n = 1, 2, ....
Let {cn} be a sequence of positive
constants decreasing to zero. Define for t E R
an estimate of Hog(l)(t). Finally, let
(2.2)
with estimates of ¢ defined recursively by (2.1).
The following assumptions will be employed.
AI:
The distribution funtion F has increasing failure rate.
- 16 -
F is absolutely continuous
e
with bounded continuous density f, F(O)=O, and JJ2= f:t2dF(t) < 00 .
A2: There exists known 6 such that 0 < 6 < ¢J*. For tE(6, 00), D(t)(t-¢J*) > 0 for
t
=f: ¢J*.
A3: g: R-+( 6, 00) is a known, continuous, strictly increasing function with s+l bounded
··
. contmuous.
.
d envatlves
suc h t h at g (2) IS
A4: {an} and {cn} are sequences of positive constants such thatn....oo
lim an = n....oo
lim Cn =
00
0, E n =l an =
00,
00
E n =l anCn <
00 ,
00
2/
and E n =l an Cn <
00.
AS: Assume Hog(l) is continuous on R.
(a)
Hog(2)(t) exists for each t E R and is bounded.
(b)
AS (a) holds and Hog(2) is continuous in a neighborhood of ¢J.
(c)
Hog(2)(t) exists for each t E R and for some constant K 1 > 0
)..
R.
' Assume H og( 2
IS contmuous on
A S:
(a)
Hog(3)(t) exists for each t E R and is bounded.
(b)
AS' (a) holds and Hog(3) is continuous at ¢J.
A6: For constants A, C>O, let f = A3tDg(t)lt=¢J' a n =An- 1 , and cn=Cn - I , where
/E(O, 1) is such that 1-/<2f.
- 17 -
Theorems 2.1 and
2.2 give sufficient conditions for a.s.
convergence and
asymptotic normality of the estimates generated by the procedure.
Theorem 2.1:
Assume AI-A4 hold.
Further suppose that anyone of the assumptions
A5(a), A5(c), or A5'(a) holds. Then <Pn
-+ <P
a. s.
c
2 Hog(I)(<p)
Theorem 2.2: Assume AI-A3, and A6 hold. Let E =_1~_---,...-_
2[g( <P )]2
by A6, if either
(a) A5(b) holds with
With
r
defined
l :5 1 < 1, or
(b) A5(c) or A5'(a) holds with g
<
1
<
1, then
1E)
.
1
(
A 2 Cn (I- )/2 (.A.n-.A.).Q.
'I'
'I'
N 0, 2f-(1-1)
(c) If A5'(b) holds with 1=g, then
AC 2 C Hog(3)(<p)
1
6 g( <p)[ f -
g]
2.3 Proofs
This section is devoted to proofs of Theorems 2.1 and 2.2. The following theorem
due to Robbins and Siegmund(1971) is the main tool used in establishing Theorem 2.1.
Theorem' Robbins-Siegm und ):
Let
~n
be a non decreasing sequence of sub IT-fields
71n, and 1n are nonnegative
~n-measurable random
- 18 -
of~.
Suppose that X n , ,Bn,
variables such that
e-
00
00
Then, lim X n exists and E 'Yn < 00 on the set such that E I3n
n-+oo
n=l
n=l
00
< 00 and E TIn <
n=l
00.
Before proceeding with the proofs, some simple consequences of the assumptions
are noted.
For any sequence of real numbers {xn}, let Y n = Nn(g(xn+cn)) - Nn(g(xn -cn)).
Assuming any part of AS or AS' holds, by an appropriate Taylor expansion, A1, and
A4,
By A1, for
(2.3)
k~2,
P{Y n = k}
~
~ L,.; [F
(j)
(g(xn+cn)) - F
(j)
(g(xn -cn)) ] [ F(g(xn+cn) - g(xn -cn)) ]
k-1
j=l
Since Cn
(2.4)
10, ultimately F(g(xn+cn) - g(xn-cn)) < 1 ,and thus for any integer p>O,
E{ Y~ X[ Y n ~ 2]} =
00
K 1 Cn
L
k P [F(g(xn+cn)- g(xn- c n )) ]k-l
k=2
= o( cn).
The above implies that
- 19 -
P{ Y n = 1 } = Hog
(1)
(xn)2cn
+ o(c n )
e-
and that for any real number p > 0
p
(2.5)
EY n = Hog
(1)
+ o(cn).
(xn)2cn
We will also need some properties of renewal processes for which the underlying
distribution has increasing failure rate.
A lifetime distribution
(decreasing) failure rate (IFR and DFR, respectively) if log[l -
F is increasing
F(x)] is concave
(convex) in x. If F has density f, F being IFR (DFR) is equivalent to the hazard rate
f(x)
rex) = 1 _ F(x)
being non-decreasing (non-increasing) in x.
IFR distributions are of particular
importance in the study of maintenance policies, since increased probability of failure
upon aging is the motivation for preventive replacement of parts.
Barlow, Marshall, and Proschan (1963) and later Barlow and Proschan (1964)
explore properties of renewal processes with underlying IFR and DFR distributions.
(See also Barlow and Proschan (1965) for some of this work.)
Several of these
properties will be used here. Suppose F is an IFR (DFR) distribution with mean Jll and
associated renewal function H. Then
(2.6)
Var N(t)
< (~) H(t) ~ (~)
Jl\'
for all t ~ 0;
(2.7)
Define the IT-fields c:r n
= IT(X,1,J"
i=l, ... , n-1, j=l, 2, ... ).
- 20 -
Use
Ec:r
n
to denote
e
expectation with respect to the u-field
(2.2) the estimate q,n is
~n-measurable.
~n.
Since N n _ 1 is
~n-measurable,
by (2.1) and
Define
and
(2.8)
In Lemmas 2.1 and 2.2 bounds are derived for ~n and E~n[Dgn(q,n)12, respectively, to
be used in the application of the Robbins-Siegmund result. V n plays an important role
in the proof of Theorem 2.2. The asymptotic variance of V n will be
Lemma I I Assume AI, A2, A3, and A4 hold.
(a) If A5(a) holds, then I~nl ~ Klcn .
(b) If A5(c) or A5'(a) holds, then I~nl ~ K2cfi.
Proof: By definition, since N n is independent of
If A5(a) holds, then
- 21 -
~n
while q,n is
~n-measurable,
eIf A5( c) holds, then
by the Taylor expansion for g. If A5'(a) holds, then
where again 171i -tPn 1$ cn, i= 1, 2. In this case
completing the proof of part (b) of the lemma.
Lemma 2.2: Assume AI-A4 hold.
Then
E':J) Dg n ( tPn)]2
o
Further suppose that A5(a), A5(c), or AS'(a) holds.
$ K11cn .
Proof: Using the definition of Dg n and the fact that (a+b)2 $ 4(a2 +b 2 ), one has that
- 22 -
By (2.5), AI, and A3,
Since F has increasing failure rate, it follows from (2.6) and A3 that
Noting that Cn = 0(1), the lemma follows from the inequality for Dg~ at the beginning of
the proof.
0
Proof of Theorem 2.1: Letting Un
Since 4>n is
= 4>n -4>, we have by (2.1) that
~n-measurable,
2
_22
2
[]
E~nUn+1 - Un + an ECJ Dgn(4)n) - 2anUnE~n Dg n(4)n)'
n
Noting that E Dg n (4)n)
CJn
= Dg(4)n) +
~n,
it follows that
- 23 -
e·
If AS(a) holds, then by Lemma 2.1
Similarly, under AS(c) or AS'(a),
since cn=o(I). Thus by Lemmas 2.1 and 2.2,
By A2 and A4 the assumptions of the Robbins-Siegmund result are met. Thus for some
eone has that tPn ~ e a.s. and n=1
E an(tPn-tP)Dg(tPn) <
00
finite random variable
(tPn - tP )Dg( tPn) > 0 for tPn =P tP by A2 and
00
E an =
n=1
00
by A4,
tPn
~
tP a.s.
00.
Since
0
Before beginning a sequence of lemmas that leads to the proof of Theorem 2.2, we
quote a theorem of Fabian(1968) on asymptotic normality of stochastic approximation
procedures.
The univariate version of this theorem given
here is taken
from
Frees(1983).
Theorem (Fabian):
Suppose c:F n is a nondecreasing sequence of sub O'-fields of c:F.
Tn, f n , and
~n
measurable. Let
0',
are random variables such that f n ,
p, T, E, f, and
~
~n-l'
be real constants with f
- 24 -
Suppose Un' V n.
and V n _ 1 are 'J"n-
> 0 such that
e
and there exists a constant C such that C
Suppose, with
ufJ,r = E x[ VfJ >
lim
j .....oo
> I E~nV~ - E I ..... o.
rja ]
Vf,
that
J
ufJ,r
= 0 for all r
or
a=
n
1 and lim n -1 ""
L..J
j=1
Let ,8+ = ,8 x[ a = 1] where 0
Un +1 = Un [ 1 - n
<
-a
a :5 1,
rn ]
ufJ,r =
0:5,8,
0 for all r.
,8+
<
2r, and
+ n -(a+,8)/2 ~n V n + n -a-,8/2 Tn·
Then,
n
Lemma
~
,8/2
Un
Assume AI-A3 and A6 hold.
(a) If A5(b) holds, then.::l n
= o(cn).
(b) If A5(c) or A5'(a) holds, then .::In
(c) If A5' (b) holds, then
= O(c~).
n~% .::In/c~
C
=
1
Hog(3)(4»
6 g(4))
Proof: If A5(b) holds, then from the proof of Lemma 2.1 and since g( 4>n)
l.::lnl
1
(2)
(2)
/
:5 4CI1Hog (7]1) - Hog (7]2)l c n [g(4)n)]
:5 K11Hog
(2)
(7]1) - Hog
- 25 -
(2)
(7]2)l c n
> 8,
with l11i-tPnl~cn, i= 1,2.
Since ItPn-tPl .... 0 a.s. by Theorem 2.1 and Hog
(2)
IS
continuous in a neighborhood of tP, part (a) holds.
Part (b) is identical to Lemma 2.1 (b).
Also from the proof of Lemma 2.1, when A5'(b) holds,
with l11i - tPnl
~ Cn,
i= 1,2. Since tPn .... tP a.s., (c) follows when Hog(3) is continuous in
a neighborhood of tP.
0
Lemma 2.4: Assume AI-A3. If A5(b), A5(c), A5'(a), or A5'(b) holds, then for t? 1 ,
where V n is defined by (2.8).
Proof: By definition
Since F has increasing failure rate, by (2.7)
- 26 -
e-
<
[g(~n) ]-2t e -[ g(~n)/JJ]
f
. 0
J=
}t~l] [g(tn)]j
J.
where [ t+l ] is the greatest integer less than or equal to t + 1. Thus
+ e -[
g(~n)/JJ]
f
j=[t+l]+l
since [t+l]
~
2t for t
~
}t+l]
[g(~n) ~j-[t+l]
j! JJJ
1. By (2.6) and A2,
Hence, again using the fact that g( ~n) > 6,
which establishes the result.
0
Lemma 2.5: Assume AI-A3 and A6 hold. Further suppose that one of the assumptions
A5(b), A5(c), A5'(a), or A5'(b) holds. Then
- 27 -
e·
(b)
lim
n....oo
EllrnV~
-.r
=E =
cy Hog(l)(4»
2 [g( 4> )]2
Proof: For part (a) we note that Ellr V n
--.rn
= 0 by definition.
Next, let
Zn
=g
so that V~ = cnCI[ X n expectations
(i)
(1)
(4)n)[ g(4)n) ]
Yn -
Zn ]2.
-2
[ N n (g(4>n)) - Hog(4)n) ] ,
To show (b), we consider separately the
E':fnX~, EtJnX n Yn, etc.
By (2.5), Theorem 2.1, and the continuity of Hog(l),
=
Cn [g(4)n)]
-
-2
(1)
Hog(l)(4»
2 [g( 4> )]2
-0
00
[Hog
4cn
(ii) Using (2.5) and the fact that g(¢ln)> c5,
as n _
1
2
by the continuity of H.
- 28 -
as n -
00.
(4)n) 2c n
+ o(cn)
]
e
(iii)
By the conditional version of the Cauchy-Schwarz inequality,
Thus by (2.5) and (2.6),
(iv)
Again by (2.5),
~
1
K 1 Cn [ 2Hog
(v)
For all n, Ec:1 Y nZn = O.
(vi)
It follows from (2.6) that
(1) (¢>n)cn + o(cn) ]2 .... 0 as n ....
n
- 29 -
00 •
Thusn~oo
lim cnEGr
-;Tn Z~ = 0 .
The lemma follows immediately from (i)-(vi) since by definition
Lemma 2.6: Assume AI-A3 and A6 hold. Further suppose that one of the assumptions
A5(b), A5(c), A5'(a), or A5'(b) holds. Then
(a) there exists K > 0 such that Ec:fn V~ ::; K for all n, and
(b) for any t>
- 1,
.
1/2
t
hm EGr (cn V n ) = 0 .
n~OO-;Tn
Proof: Since the conditions of Lemma 2.4 hold,
By Al and (2.5)
Again by Al and (2.5),
This establishes (a).
- 30 -
By the definition of hg n and (2.5) one has that
t t l
[t+1]
cn E~nlhgn(<Pn)1 :5 2 t E~n[ Nn(g(<Pn+cn)) - Nn(g(<Pn-cn))]
= \
[2Hog(1)(<Pn)cn+ o(cn)].
2
Also by (2.5),
0
This establishes (b) using Lemma 2.4.
Lemma 2.7: Let /T~,r
= E{ V~
X[ V~~rn] } for r
Lemma 2.6 and A6 hold. Then for each r,
= 1,2, ... and n = 1,2, ....
lim /T~ r = O.
n.....oo
'
Proof: For the "'{ E (0, 1) of assumption A6, take q> 1 such that "'{ :5
such that ~ + ~
= 1.
~
a.
Then take p
The proof is then identical to that of Lemma 2.7 of Frees( 1983),
using the fact that by Lemma 2.6,
.f!:QQf of Theorem
Assume
E(c~/2Vn)p =
0(1).
0
We apply the result due to Fabian(1968)
earlier.
By the definition of Vn
Expanding Dg( <Pn) about Dg( <p) = 0 one has that
- 31 -
10
the form quoted
e·
Dg(cPn) = (cPn -cP) 1t Dg (t)l t =7Jn'
where l7Jn -cPl $ IcPn -cPl· As before let Un = cPn -cP. Then
d
-1 -1/2 ,/2
Un [ 1 - an dtDg(t)lt=7Jn ] - An
C
n
V n - an~n'
Set
rn
= A 1tDg(t)lt=7Jn'
<l>n = <l> = - AC
-1/2
a=l,
,
and (3 = 1-,- Then
U n +1 = Un [ 1 - n
-1
rn ] + n
-- U n [1 - n -ar]
n
Let
r
<l>n V n
+n
-1 -(1-,)/2
n
Tn
+ n -(a+(3)/2,..'¥n V n + n -a-«(3/2)T n·
e
= A 1tDg(t)lt=cP' Since 1tDg(t) is continuous at cP and cPn ..... cP a.s. by Theorem
2.1, it follows that
rn
.....
r
as n .....
00.
2.6 there exists a constant K such that
n .....
(~-1)
00.
With E =
I E<:f n V~
C 2 Hog(I)(cP)
1
2 ' by Lemmas 2.5 and
2 [g( <p)]
- E I < K for all nand Ec;r V~ ..... E as
In
Using Lemma 2.7, .lim (1'~ = .lim E{ V~ X[ V~~rja] } = 0 for all r. By A6,
J.....OO J ,r
J.....OO
J
J
1-, < 2r.
It remains to establish the asymptotic behaviour of Tn- If A5(b) holds and , ~
then by Lemma 2.3
Under A5(c) or A5'(a), with, >
!'
- 32 -
1,
If A5'(b) holds and 'Y
Tn
= ~ , then
=
by Lemma 2.3. The limiting normal distributions indicated in Theorem 2.2 then follow
from the Fabian result.
0
2.4 Random Block Replacement Policies
For the estimation procedure of Section 2.2 it is of course not necessary to observe
in entirety the renewal processes N n = {Nn(t),
t~O},
n
~
1;
randomly truncated processes {Nn(t), t E [O,g(¢n+cn)]}, n ~ 1.
one needs only the
Thus
ID
practice one
could make planned replacements at time intervals of random length g( ¢n +cn) and also
make replacements at any intervening failures, possibly reducing costs.
Consider those procedures for estimating ¢* which, like the SA algorithm just
introduced, can be carried out concurrently with a random BRP. This means that given
a planned replacement at time t, the next planned replacement will be made at time t
+
Tn, where Tn is a random variable observable at time t. Tn may be an estimate of 4J*.
For such a procedure let NF(t) and Np(t) denote the number of failure and planned
replacements through time t.
Define R(t) = C 1 N F (t)
+ C 2 N p (t).
Then R(t)/t is the
incurred cost per unit time of using the procedure through time t.
property that such a procedure might possess is that lim R(t)/t
t~oo
= B(4J*),
One desirable
the minimulll
possible expected long-run cost per unit time. The following theorem shows that this is
indeed the case when T n ~ ¢* a.s., F has increasing failure rate, and H is contin uous at
- 33 -
¢J*.
The result and proof are similar to Theorem 1.1 of Frees(1983), which dealt with
random age replacement policies.
For the remainder of this chapter let {X.. }~-1 be independent and identically
IJ I,J-
distributed random variables with distribution function F defined on a probability space
(11, t5, P).
Take N n = { Nn(t), t~O } to be the renewal process associated with the
X
sequence {X nJ. }~1
J- with renewals occurring at X n1 , X n1 +X n 2' X n I+ n_')+X n 3"" for
n = 1,2, .... Define the O'-fields t5 n = 0'( X ij , i=l, 2, ... , n-l,j=l, 2, ... ).
Theorem
~
Assume F has increasing failure rate and finite mean J.ll' Suppose { Tn}
is a sequence of random variables such that Tn is
t5 n -measurable and Tn
n
Further assume H is continuous at ¢J*. Define R n = C 1.E N.(T.)
J=1 J J
cost through the time of the n-th planned replacement.
+ C2 n
-+
¢J* a.s.
, the incurred
With N(t) the number of
planned replacements through time .t,
R
lim
Nt(t) = B(¢J*) .
t-+oo
n
+
Proof; Let Un =.E [N.(T.) - H(T.)] , so that { Un, t5 n 1 } is a martingale. By the
J=1
J J
J
increasing failure rate assumption and (2.6),
00
L
n=1
00
a.s.
This implies that Unln -+ 0 a.s. (See Theorem 5 of Chow (1965)). By the continuity of
H at ¢J* and since Tn -+ ¢J*
a.s.,
n
n -1
H(Tj ) -+ H( ¢J*) a.s.
j=1
L
Hence it follows that
- 34 -
e·
and
n
Define L n =.E T., the time of the n-th planned replacement. Necessarily,
J=l J
Thus the inequality
N(t)
RN(t)/N(t)
N(t)+l LN(t)+l/[N(t)+l]
holds.
~
RN(t)/N(t)
RN(t)
- t-
<
(This is identical in form to (1.8) of Frees(1983).)
LN(t)/N(t)
Thus with probability one
Tn ~ tb* and Ln/n ~ tb*. The theorem follows by letting t ~
00.
o
2.5 A Critical Assumption
This section concerns the interrelationship between the existence of an optimal
BRP and assumption A2. At this time we have not located in the literature any general
characterization of when an optimal BRP exists.
Thus we emphasize by means of
examples the dependence of a unique optimal policy on the relative magnitudes of the
costs C 1 and C 2 .
Throughout the remainder of this section we assume that the
derivative function D(t) is continuous in t.
Assume A2 holds, so that tb* minimizes B(t) among all finite values of t. A2 also
implies that B(t) attains no local maximum and thus lim B(t)
t~oo
> B(tb*). Thus
¢>* is the
optimal BRP and is to be preferred to only making part replacements at failure.
Of
course A2 is not necessary for the existence of an optimal BRP, as can be seen in an
- 35 -
example that will be presented shortly. However, it does play an important role in the
proof of the strong convergence of the algorithm.
e-
The important part of A2 can be written as the requirement that
(2.9)
th(t) - H(t)
<
C 2 /C 1 for t
< q,*
and th(t) - H(t) > C 2 /C 1 for t > </J*,
which expresses how the existence of an optimal BRP partially depends on the ratio of
the costs C 1 and C 2 . To see this more explicitly, we will examine when (2.9) holds for
three lifetime distributions of the gamma type.
The probability density function for
these distributions is given by
x v - 1 e-x/r(v), x>O,
f(x)
={
0, x:5 0,
with v = 2, 4, and 6.
Figures 2.1(a), 2.1(b), and 2.1(c) show the function th(t) - H(t) for the gamma
distributions with shape parameters 2, 4, and 6, respectively. It is immediately apparent
that in all three cases the ratio C 2 /C 1 must be sufficiently small for (2.9) and A2 to be
satisfied. Also, if C 2 /C 1 is too large, then D(t) is negative for all finite values of t and
no finite optimal BRP exists.
It can be seen from the figures that for certain
intermediate values of C 2 /C 1 , the function D(t) may have at least two zeros.
One of
these could represent an optimal BRP, but since A2 fails to hold, the proposed
stochastic approximation algorithm may also fail.
The examples suggest the following general observations.
intersects the function
t h(t) -
H(t)
and the critical expression (2.9) holds.
If the line y
=C 2 /C 1
exactly once, there exists a finite optimal BRP
Otherwise, an optimal policy mayor may not
- 36 -
e
exist, and if it does the stochastic approximation procedure defined in Section 2.2 may
well fail to be consistent.
Thus we can imagine that for many lifetime distributions
failure replacements must be significantly more expensive than planned replacements for
the procedure to be useful.
- 37 -
e-
i
t*h(t) - H(t)
0.04
0.08
0.12
0.16
0.20
0.24
0.28
'1
-.
LO
C
l
.(1)
N
LO
o
3
3
o
~
~
t*h(t) - H(t)
0.00
o
0.10
0.20
0.40
0.30
I
-.l
LO
N
C
-')
.([)
N
,-+
~
0
to
o
~
N
~
~
~
(j)
.
:3
:3
o
,.--......
~
~
co
tv
0
e·
t*h(t) - H(t)
0.0
0.1
0.2
0.3
0.4
0.5
0.6
o
'1
-.
LO
C
1
CD
N
~
r--...
0
'--./
.....+
~
0
LO
~
0
N
~
~
~
0'>
.
3
3
0
r--...~
~
OJ
OJ
'--./
N
0
e
CHAPTER 3
REFINEMENTS AND A FULLY SEQUENTIAL ESTIMATION PROCEDURE
3.1 Introduction
In this chapter we consider several refinements to the procedure introduced
In
Section 2.2.
In Section 3.2 the simple renewal density estimator used in (2.2) is replaced by a
more general kernel estimator.
Under certain conditions this modification reduces the
bias at each step of the procedure and consequently results in a faster rate of
convergence.
As seen in Theorem 2.2, the asymptotic mean square error of the estimate
depends on the tuning constants A and C, which can be chosen by the user.
Sections
3.3 and 3.4 address the problem of an adaptive stochastic approximation procedure that
estimates the optimal choices of A and C so as to achieve the minimum possible
asymptotic mean square error.
Finally, in Section 3.5 construction of a fixed-width confidence interval for the
optimal block replacement policy is considered.
This leads to the introduction of a
stopping time which can provide guidance in deciding when the algorithm has been
iterated a sufficient number of times for a desired level of confidence.
3.2 Improving The Rate Of Convergence
e·
This section is principally concerned with achieving a faster rate of convergence
for the procedure (2.1) by using a kernel estimator for Hog(1).
Kernel estimation of
Hog(p+l) for integer p greater than zero is also considered here for applications
Sections 3.4 and 3.5.
111
The basic approach is that taken by Singh (1977) for kernel
estimation of derivatives of a distribution function.
Frees (1983) employed estimators
like those suggested by Singh to speed the rate of convergence of a stochastic
approximation procedure. The development undertaken here is like that of Frees except
that interest now lies in estimating derivatives of Hog.
Let 0
~
p < r where p and r are integers.
estimating Hog(p+l)(x)
assuming that Hog(r+l)(x) exists and is bounded.
continue to represent an estimator of Hog(I). For
of Hog(p+l).
We will consider the problem of
p~O, hg~P) will
It is desirable to have estimators hg n and
Let hg n
denote an estimator"
hg~P)(x)
which satisfy the
following conditions:
A7:
(a) supl
x
Ehg~P)(x)
(b) For t
A8:
-
Hog(p+l)(x)
t+l
> -1, sup EI hgn(x) I
x
For any integer t
~
I = O(
n -(r- p )/(2r+l».
= O(n +t/(2r+l) ).
0 and sequence {xn} such that lim Xn
n....oo
xo, there exist con-
stants 4>1 and 4>2 ,t such that
r
(a ) n1,!Il00
n
(r-p)/(2r+l) [Eh (p)( )
H (p+l)( )] - 4>
d
gn Xn og
Xn
1 an
lim
-t/(2r+l) E [h ( ) ]t+l - 4>
( b) n....oo
n
gn Xn
2,t·
- 42 -
It will be seen that if an estimator hgn(x) of Hog(I)(x) such that A7 and A8 hold is
used in the SA algorithm considered in Section 2.2 and r> 2, then 4>n will converge to 4>
at a faster rate than implied by Theorem 2.2.
As a first step a kernel estimator of
Hog(p+1)(x) is defined which satisfies A7 andA8.
Denote by B the class of all Borel-measurable bounded functions K that vanish
outside the interval ( -1, 1). Take
K p = { K E B such that
~
J.
+1
f
1, if j=p,
}.
j
u K(u)du =
-1
0, ifj:;=p,j=O, 1, ... , r-1
As in Section 2.2, let N n , n = 1, 2, ... be independent renewal processes, each
having underlying distribution F with renewals at times S m. = X n1 + X n 2 + ... + X nJ"
i = 1, 2,.... For K E K p define
(3.1)
00
= c~~1 i=l
L
where { cn} is a sequence of positive constants.
The next several results are analogous to those of Singh(1977) and Frees(19S3)
concerning estimation of the derivatives of a distribution function.
provide sufficient conditions for A7 and A8 to hold for the estimator
The lemmas will
hg~P)(x).
Part (a)
of Lemma 3.1 corresponds to Theorem 3.1 of Singh, while part (b) is similar in spirit to
Lemma 3.1 (b) of Frees.
- 43 -
Lemma
~
(p)
Let K E Kp and hg n (x) be defined by (3.1).
Suppose Hog
(r+1)
IS
bounded. Assume Cn =0(1). Then
(b)
Also suppose that {xn} is a sequence such that lim Xn
n-+oo
=
xo, where
Hog(r+l)(x) is continuous in a neighborhood of Xo' Then
= Hog(r+l)(xo)
~
+1
f
u r K(u)du .
-1
Proof: By the substitution t =
g-I~:)
- x and an application of bounded convergence
one has that
=
1
p+l
Cn
=
l
c~
= l
c~
f+l
K(t) hog(x+cnt) g(I)(x+cnt) dt
-1
f+l
K(t) Hog(I)(x+cnt) dt.
-1
Expanding Hog(I)(x+cnt) in a Taylor series about x,
(1)
Hog
(x+cnt)
r-l H
= .2:0
J=
0+ 1)()
og.,
J.
.
x (cntr + Rn(t)
(r+l)
where Rn(t) = Hog r! ('l(t)) (cnt)r with I 'l(t)-x 1:$ I cnt I. Thus
- 44 -
e-
+
= Hog
(p+1)
cgf~~K(t)
Hog
(r+1)
r! (7J(t)) (cntldt
r-p
(x) + O( c n
)
since K E K p and Hog(r+1) is bounded. This establishes (a).
By part (a) it follows that
-(r-p)
n~~cn
(p)
(p+1)
[Ehg n (xn) - Hog
(xn)
1
by the bounded convergence theorem and the continuity of Hog(r+1) at Xo . Thus (b)
o
holds.
Note that under the assumptions of Lemma 3.1, if Cn = Cn -r, where r = 2r~1'
then A7 (a) holds. Then by Lemma 3.1 (b) one also has that
r
n~oo
n
(r-p)/(2r+1) ( Eh (p)( )
H (p+1)( ))
gn Xn og
Xn
_ n~oo
r
-
C(r- p ) -(r-p) ( Eh (p)( )
H (p+1)( ))
Cn
gn Xn og
Xn
- 45 -
.
so that A8 (a) holds with ~1
= C (r-p) Hog (r+1) (xo)
The following lemma establishes results for
1
I
r.
f+1
-1
r
K(t)t dt.
hg~P)(x) similar
e·
to those contained in
Lemma 3.2 of Frees(1983).
Lemma 3.2: Let K E KO and hgn(x) be defined by (3.1) with p
= O.
Assume Hog(l) is
X
n =
Xo .
To show part (b), we condition on the number of renewals occurring
In
the
a bounded function that is continuous at xo; {xn} is a sequence such that lim
n-+oo
Further assume that Cn = 0(1).
(a) Ift>-l,then
(b) If t
~
sup
x
EI
hgn(x)
It + 1 = O(cil t ).
0 is an integer, then
.
t E[ hgn(xn) ]t+1 = Hog(1) (xo)
hm
cn
n-+oo
Proof:
f
1 [ K(u) ]t+l duo
-1
By the definition of hg n , (2.5), and the bounded ness of K and Hog(l)
This establishes (a).
and the boundedness of K,
- 46 -
f
E{
j=2
c~
[hgn(xn) ]t+l
l Yn
=j} P{Y n =j}
~c~f IIKII~ljt+lp{Yn=j} --+Oasn .... oo.
j=2
The following result will be used to find the conditional expectation when Y n = 1.
Let Xl' X 2 , ... be i.i.d. random variables with distribution function F which define a
renewal process with renewals occurring at Sn = Xl + ... + X n , n = I, 2,.... Define the
excess random variable at time t to be
"Y(t) =
SN(t)+l -
t,
so that "Y(t) is the remaining life of the part in use at time t. This random variable has
distribution function
P{ "Y(t)
~
x} =
f~
[F(t-u+x) - F(t-u) ]dH(u) + [F(t+x) - F(t)].
(See Barlow and Proschan(1965), Theorem 2.8.) Thus the distribution function and the
density of the first renewal to occur after time g(xn -cn) are given by
and
a(v) = h(v) -
fV(g Xn-Cn ) f(v-u)dH(u),
- 47 -
~
respectively, for v
g(xn -cn). Thus
g(Xn+Cn)
f
=1 } =
P{ Y n
a(v) [1 - F( g(xn+cn)-v) ]dv
g(xn-cn)
and given that Y n = 1, the conditional density of the lone renewal in the interval
b(v) = a(v) [ 1 - F(g(xn+cn) - v) ]
P{ Y n = 1 }
Hence, by a change of variables,
=
E{
en1
f
g(xn+cn) [
(1
~t+1
K g- (u)
Cn - xn)
dNn(u)
I Yn
=1}
P{ Y n = 1 }
g(xn-cn)
+1
=
f
[K(v)]
t+l .
(1)
a(g(xn+cnv»[l- F(g(xn+cn)-g(xn+cnv»]g (xn+cnv)dv
-1
where
g(Xn+Cn V )
a(g(xn+cn v »
f
= hog(xn+cn v ) -
g(xn-cn)
- 48 -
f( g(xn+cnv)-u )dH(u) .
Since K and f are bounded,
g(xn+cn v )
+1
J[K(v)]t+l J
-1
f(g(xn+cnv)-u)dH(u) [1 - F(g(xn+cn)-g(xn+cnv»]g(l)(xn+cnv)dv
g(xn-cn)
~
g(Xn+c n )
t+1
2 II K 1100 II f 1100 II
g(l)
J
1100
dH(u)
-0
g(xn-cn)
by the continuity of F and g. By similar bounds
+1
lim
n....oo
J
[K(v)]t+1
-1
since lim F(l) = O. Thus,
l-+O
+1
= lim
n....oo
J [K(v)]t+1 hog(xn+cn v )
g(l)(Xn+cnv ) dv
-1
+1
= Hog(l)(xo )
J [K(v)]t+1 dv,
-1
o
which completes the proof of the lemma.
Suppose Cn = Cn
-1
1
where 1=2r+1.
Then Lemma 3.2 (a) implies that Ai (b)
holds for the kernel estimator hg n . Similarly, hg n satisfies A8(b) by Lemma 3.2 (b).
Let the sequence {tPn} of estimates of tP be defined recursively by (2.1), where in
the definition (2.2) the estimator hgn(x) is redefined to be any estimator of Hog(l)(x).
The following theorem establishes some convergence properties for tPn assuming that
- 49 -
hgn(x) satisfies A7 and AS.
The subsequent corollary addresses the special case when
hgn(x) is of the form (3.1) with p=O.
Theorem I I
1
-1
Let 1 = 2r+I' cn=Cn
,and
r
d
= A dt Dg(t)l t =4> for the positive
~
4> a.s.
constants A and C of A6.
(a) Assume AI-A4 and A7 hold. Then 4>n
(b)
N( 1'0'
Assume AI-A3, A6, A7, and AS hold.
Then
n(I-1)/2( 4>n -¢)
O'~) where
1'0 = -
g(4)) [
r -
(1-1)/2 ]
and
Proof of Theorem
aJ.!At
Again take Un = 4>n -4>. By A 7 (a) with p = 0,
sup I E[hgn(x)] - Hog
x
(1)
(x) I = O( n
-r/(2r+I)
This implies that
~
K In
-r/(2r+I)
By A7 (b) with t = 1
so that by the proof of Lemma 2.2,
- 50 -
=
K
In
-1 r
).
g
Thus from the proof of Theorem 2.1
Since the required summations are finite by A4 with Cn = C n -/, cPn
application of the Robbins-Siegmund result.
-+ cP
a.s. by an
0
The proof of Theorem 3.1 (b) will require the following lemma, similar to Lemma
2.5 (b). As in Section 2, let V o
Lemma
~
= c~/2[Dgn(cPn)
Assume A1-A3, A6(with /
- Dg(cPn) - Ll n ].
=1/(2r+1)), A7 and A8 hold.
- 51 -
Then
Proof: By definition V~ = Cn Ci [ X n - Y n - Zn ]2 where
As in
Lemma 2.5, we consider separately the conditional expectations E~nX~,
E~nXnYn, etc.
Since the conditions of Theorem 3.1 (a) hold, 4>n'" 4> a.s.
(i) Since 4>n ... 4> a. s., by AS (b) with t=l
2
cn E~ X n
n
(ii) By A3 and A7 (b) with t=O
$ K 1 cn 0(1) -
0 .
(iii) By (2.6), A3, A7 (b) with t=l, and the conditional version of the Cauchy-Schwarz
inequality,
$ Cn K
2
{ O(n +1/(2r+1)) }1/2 ...
- 52 -
o.
e·
(iv) Also by A3 and A7 (b) (with t=O),
CnEtJ'nY;
= Cn [g(<Pn)]-2 [EtJ'nhgn(<Pn)]2
(vi) By part (vi) of the proof of Lemma 2.5,
0
The result follows from (i)-(vi).
Proof of Theorem a:l.
L.hJl
lim Cn Etlr
Z; = O.
;Tn
n~
Again we employ the theorem on asymptotic normality due
to Fabian. From the proof of Theorem 2.2
U n + 1 = Un [ 1 - n
where as before
and Tn = - An
-1
rn
]
+ n (~-1)~n V n + n -1 n -(1-1')/2 Tn
d
rn
= A atDg(t)lt='1n ~
(1-1')/2
~n'
lim
n~oo
= - AC
= -
d
= A dtDg(t)lt=<p' ~n = ~ =
-AC
~ <P
a.s., using A8 (a) with p=O gives
Tn
1
-1/2
,
The asymptotic behaviour of Tn now depends on how
many derivatives of Hog exist. Since <Pn
T =
r
n~~[ g(<Pn) ]-1
n(I-1')/2 ( EGJn[hgn(<Pn)] - Hog(I)(¢n»)
A Cl~1
g(<p)
By definition EtJ'n V n = O. By Lemma 2.4
- 53 -
which is bounded by A7 (b). By Lemma 3.3,
lim
n-+oo
We need to show that O'fi,r= E{ vfi x[ Vfi~rn] } -+ 0 as n -+
00
To accomplish this using Lemma 2.7 it is first shown that for any p > 2 ,
-+
o.
for r= 1, 2, ....
Ec:r n l c~/2V niP
By Lemma 2.4,
By A7 (b),
= O( n -1/(2r+1»
_
0
if p > 2. Now taking t =0 in A7 (b)
For Lemma 2.7, p, q > 0 were such that ~
+
&= 1.
Thus, taking p> 2, Lemma 2.7
holds.
Taking
0'
= 1 and
Fabian's theorem.
13 = 13 + = 1- r, part (b) of Theorem 3.1 now follows from
o
Now let K E KO and define hg n by (3.1) with p=O. Lemmas 3.1 and 3.2 apply to
hg n if the following assumption is met.
- 54 -
A9:
For some integer r
~ 1,
Hog(l) and Hog(r+1) exist and are bounded on the entir'e
real line, and both functions are continuous in a neighborhood of 4>.
Assume A1-A3, A6, and A9 hold. Let r = 2r~1 in A6 (for the r of
Corollary a..l;,
A9) and
r
= A3tDg(t)lt=4>' Then
(a) 4>n -+ 4> a.s.
and
(b) n(1-r)/2 (4)n -4»
~
ITT)
N(1l1'
where
and
with
PI = ~r.
+1
Ju r K(u)du
-1
and
[K(u) ]2 du .
2= J+1
-1
P
Proof: By the discussions following Lemmas 3.1 and 3.2, A9 implies A7 and AS hold for
hgn(x). The corollary is then an immediate consequence of Theorem 3.1 with
~1
= C r Hog(r+1)(4»
~
+1
J
u r K(u)du
-1
and
~2,1
= lim
n-+oo
n -1/(2r+1) EGr
-;Tn
[
hg n (4)n) ]2
= C- 1 Hog(l)(4» J+1 [ K(u) ]2 duo
-1
- 55 -
o
3.3 The Optimal Tuning Constants
e·
In this section we consider the problem of choosing the constants A and C so as to
minimize the asymptotic mean square error of 4>n.
From this section onwards it is
assumed that in the algorithm defined by (2.1) and (2.2) hg n is the kernel estimator
introduced in Section 3.2.
From Corollary 3.1 it appears that the limiting distribution
of 4>n depends on the function g as well as on the renewal function H. This dependence
will be explored further to assess how important the choice of g is to the behaviour of
the procedure.
Let
e be a
normal random variable with mean 1'1 and variance CTI ' where PI and
CTI have been defined in Corollary 3.1. Then under the conditions assumed in Corollary
3.1,
Suppose the quadratic risk function E(
e- 4»2 is used
as the criterion in estimating 4>.
Then once the rate of convergence has been determined by the kernel estimator hg n , a
measure of performance for the procedure is Ee 2 , the asymptotic mean square error of
4>n. One would naturally like to minimize this quantity, which depends in part on A and
C.
By taking partial derivatives of
Ee 2
=
IJI
optimal choices of these constants are found to be
and
- 56 -
+ CTI
with respect to A and C, the
e
Thus the minimum possible mean square error (for the particular kernel K used in hg n )
is
Unfortunately, the unknown constants Hog(1)(q'», Hog(2)(q'», and Hog(r+1\q'»
partly determine A opt and Copt.
In Section 3.4 an adaptive algorithm is introduced
where A opt and Copt are estimated at each stage of the procedure so as to minimize the
limiting mean square error.
As in Section 2.2, let q'>* = g(q'»
be the optimal block
replacement policy. Redefine q'>~ so that q'>~ = g(q'>n), an estimate of q'>*. Since the real
interest is in estimating q'>*, the remainder of this section examines to what extent the
limiting distribution of q'>~ depends on g.
Under the conditions of Corollary 3.1, n(1-r)/2 (q'>n - q'»
~
N( J.l1'
uI)'
Since g
is an appropriate differentiable function, by the delta-method
and
=
K2A2C-1[g(1)(q'»]2Hog(1)(q'»
2r - (1-r)
Note that
r
= A3tDg(t)lt=q'> = AD(1)(q'>*)[g(1)(q'»]2 (where D was defined in
Section 2.2) since D( q'>*) = O. Thus the denominators of the above expressions may be
- 57 -
freed from dependence on g by taking A = A o [g(1)(¢)]-2 for some positive constant
A o . This results in
and
= K 2 A~ C- 1 [g(l)( 41)] -1 h( 41*)
2Ao O(1)(¢*) - (1-..,.)
Letting C
= Co [ g(l)(¢) ] -1 for some positive constant Co,
and
=
K 2 A~ Col h(¢*)
2 A o O(l)(¢*) - (1-..,.) ,
Thus the asymptotic bias of ¢ri depends on g while the variance does not.
Since the asymptotic mean square error of ¢ri is [g(1)(¢)]2 [JJI
+
uiL the best
choices of A and C for estimating 41* are the same constants A opt and Copt' By setting
A opt
= Ao [g(l)(¢)] -2
and Copt
= C o [g(l)(¢)] -1
it can be seen that one should let
A o = [0(1)(41*)] -1 and Co = g(l)(¢)C opt ' Therefore, for estimation of 41* the optimal
choice of C depends on g while the best choice of A does not.
- 58 -
e·
3.4 An Adaptive Stochastic Approximation Algorithm
It was seen in Section 3.3 that by taking
J-
d
A = A opt = [ dtDg(t)
It =4>
1
and
· mean square error
t he asymptotic
0
f n (1- 1 )/2(-1.
-I.)'IS mlDlmlze
'"
d.
'l'n - 'I'
The constants
A opt and Copt of course depend on the unknown quantities 3t Dg(t) It =4>' Hog(l)(4»,
and Hog(r+l)(4».
In this section consistent estimates of these quantities are defined
which in turn yield consistent estimators An and C n of A opt and Copt' respectively.
Taking an = An n -1 and Cn = C n n - 1 then results in the minim urn asym ptotic mean
1
square error possible given the kernel K used to estimate Hog( ) (4)). (The choice of K
determines the constants
/3 1 and /3 2 which appear in the limiting mean square error.)
The problem of choosing K is not considered here.
Let {Ari} and {cri} be sequences of estimates of A opt and Copt' as yet to be
defined. Given positive constants Zl' Z2' Z3' Z4' a, b, c, and 1, let
(3.2)
and
(3.3)
These definitions ensure that An and C n fall within known bounds. For large enough
- 59 -
11,
e·
(3.4)
and
(3.5)
As in Section 3.2, let 'Y = 2r~1' Assume the following restrictions on a, b, c, and I are
met.
A10: a + rc
<
r'Y, 2a + b + 'Y
<
1, 3(b+'Y)
<
1, and b
<
r'Y/(r+1).
With cn=Cnn-'Y, define hgn(x) by (3.1) with p=O and K=KOED(O' Using this
modified version of hg n in Dg n , let successive estimates of t/J be defined by
(3.6)
Before introducing the estimators A~ and C~ and thus completing the definitions of An
and C n , the modified SA procedure is shown to be consistent for t/J.
lemma provides a bound for the bias
(~n)
The following
of Dg n (t/Jn). The ensuing theorem provides
sufficient conditions for t/Jn to converge to t/J with probability one.
Lemma 3.4: Suppose Hog
(r+1) .
IS
bounded. Assume Al and A3 hold. Then there exist
positive constants K 1 and K 2 such that
Proof: By Lemma 3.1 and (3.3) ,
supl Et:r hgn(x) - Hog
x
~n
(1)
(x) I
- 60 -
Thus
~ K2 n
r(c-'Y)
o
.
Theorem 3.2: Assume A1-A3, A9, and A10 hold. Then tPn ..... tP a.s.
Proof: Since Lemma 3.2 (a) holds, as in the proof of Theorem 3.1
< K
for large enough n.
an
6n
b+'Y
From the proof of Theorem 3.1 and by Lemma 3.4, using
=Ann -1 ,
E':fn U~+1
2
2
:5 U; + a;E':fn[Dg;(tPn)] + 2anl Un II An I - 2an( tPn - tP )Dg(tPn)
~ Un + an K 6 n
b+'Y
+ 2ann
r(c-'Y)
[K 7 + KsltPn -tPl
2
] - 2an(tPn -tP)Dg(tPn)
By (3.2), an = Ann -1 :5 Z2na-1 for large n. Thus by AI0,
L: a; nb+ l' <
00
n=1
and
- 61 -
00
~
LJ an n
n=I
Since for large enough n,
r(c-'}')
~
:5 Z2 LJ n
a-I+r(c-'}')
e·
< 00.
n=1
an~ ZI(log n)-In-I, it is the case that ~ an =00. Thus
n=I
the conditions of the Robbins-Siegmund result are met and <Pn ....
<P
a.s.
o
An estimator of
will be required to define the sequence of random variables {Ari}. Thus for K 1 E II( 1 let
hg~I)(x) be defined
by (3.1) with Cn = Cnn -'}' and p = 1. Next, take
(1)
(2)
CIg(<Pn)hg n (<Pn) - g (<PnHCINn(g(<Pn» + C 2 ]
[ g( <Pn) ]2
and
*
A n +I = [ n
(3.7)
E
-1 n
-1
a· ]
.
. 1 J
J=
Lemma 3.5 gives sufficient conditions for An defined by (3.2) to be a consistent
estimator of A opt with A~ defined by (3.7).
Lemma a.J2:. Assume AI-A3, A9, and AlO hold. Further suppose H og(2) is continuous
in a neighborhood of <p. Then An .... A opt a.s.
Proof: By Lemma 3.1,
E
a
_ CIg(<Pn) [Hog
~n n -
(2)
r-1
(2)
(<Pn) + O( cn )] - g (<Pn) [ C 1 H(g(<Pn»
[g(4)n)]2
- 62 -
+ C2 ]
-as n -+
n
00,
C g(¢)Hog(2)(¢) - g(2)(¢) [C H(g(¢)) + C ]
2
1
1
[ g(¢) ]2
since C n = o( 1) and Theorem 3.2 holds. Thus n -1
~ EG!'. oJ'
j=1
-+
JJ
A;;p\ a.s. as
-+ 00.
n
Let M n =.E [0. - EGJ 0.]. Then {M n , GJn+l}~=1 is a zero-mean martingale.
J=1
J
j J
Furthermore, by AI, A3 and (2.6),
by (2.5). Thus, since cn- 1
~
K 7 nb+/ for large enough n, where 3(b+/)
~
< K8 ~
n=1
- 63 -
n
-2
n
3(b+/)
<
00.
<
1 by AID,
It follows by Chow(1965) that [A~+l]-l ... A~pl
(3.2).
a.s. and thus An ... A opt a.s. by
0
As a first step in defining
C~
an estimator of Hog(l)(¢» is introduced. With the
modified version of hg n defined earlier in this section take
n
L
\)n = n- l
hgn(¢>n)·
j=l
In Lemma 3.6 \)n is shown to be a consistent estimator of Hog(l)(¢».
Lemma
~
Assume Al-A3, A9, and AlO hold. Then \)n ... Hog(l)(¢»
a.s.
Proof: By Lemma 3.1, (3.3), and Theorem 3.2 ,
as n ...
00.
Thus
Letting M n = E·~l [ hg.(¢>.) - E'!f hg.(¢>.) ], it is again the case that {M n , '!f +1 }OO_1
J-
J
J
j
J
J
is a zero-mean martingale. By Lemma 3.2 (a),
This implies that
. 64 -
n
11-
e·
since [Cnn
-1 -1
]
~
K2n
1/3
for some constant K 2 by A10 and (3.3).
using Theorem 5 of Chow(1965), 'lt n
~
0
Hog(l)(</» a.s.
An estimator of Hog(r+1) will be needed to define
K r = { K E B such that
~
J.
Thus, agaIn
cri.
Let
1, if j=r,
+1
}
J uj K(u)du =
-1
0, ifj=O, 1, ... , r-l.
Next, for a K r E K r define
g(x+cn)
J
_1_
K
r+1
r
Cn
g(x-cn)
.
wIth cn = Cnn
-1
.
Let An = n
-1
n
(g
-1
(u) - X)dN (u)
Cn
n
(r)()
</>n'
Ej =l hgj
to Lemmas 3.5 and 3.6, establishes that An
~
The next lemma, similar in spirit
Hog(r+1)(</» a.s.
Lemma I I Assume A1-A3, A9, and A10 hold.
Further suppose that for some d > 0
and all real x, Hog(r+1)(x) =
+
H og(r+1)( </»
Hog(r+1)(</»
O(
I x-</> I d).
a.s.
Proof: First, by the usual change of variables and a Taylor expansion,
= 1 J+1 Kr(t)
crn
-1
[r~.,1
~O
J=
- 65 -
Then An
-+
where 1'1(t)-xl $ Cn for all tE[-1, 1].
Thus by the definition of K r and the
assumptions of the lemma,
(r)
(r+1)
which implies that E~nhgn (¢n) ... Hog
(41) as n ...
.
-1 ~
(r)
hm n
L...J E~ hg. (41·)
n"'oo
j=1
j J
J
By the definition of
hg~r), (2.5), and
00.
Thus
= Hog(r+1) (41).
A3
•
= K 1 t(r1+1) {[g(¢n+cn) - g(¢n-cn)] + o(g(¢n+cn) - g(¢n-cn)) }
Cn
1-t(r+1)
$ K 2 Cn
.
Following Lemma 3.5 of Frees(1983) it can be shown that there exists t E (1, 2) such
that
1-t(r+1) -t
Cn
n
(3.8)
<
00
...
which is sufficient for
- 66 -
~
E«r hg(nr)(,I,)
L.J"J n l
Y'n
-
E h (r)(,I,) It -t
GJ. gj
Y'j
n
n=1
<
00,
J
when t E (1, 2). This will establish the lemma using Chow(1965). Since there exists a
-1
b+1
..
constant K 3 such that Cn $ K 3 n
, for (3.8) to hold we need t E (1, 2) such that
-t + (b+1H t(r+l) -
1 ]
<
-1.
That such a value of t exists when AlO holds is
shown in the proof of Lemma 3.5, Frees(1983).
It follows that (3.8) holds and thus
n
[ hg.(r)( q,. ) - EGJ hg.( r(q,.)
) ] ... 0 a.s., from which the lemma follows by
n -1 ~'-1
J-
j
J
J
J
J
0
Chow(1965).
To finish the definition of C n given by (3.3), let
C*
_ [ \)n
n+l -
Lemma
~
/3 2
8r
(1+1)
Afi /3i
J
1
/(2r+l)
Assume AI-A3, A9, and AlO hold. Then C n ... Copt a.s.
Proof: The proof is immediate from (3.3), Lemma 3.6, and Lemma 3.7.
o
Condition A9 is strengthened before stating Theorem 3.3.
A9': For some integer r
~ 1,
Hog(l) and Hog(r+l) exist and are bounded on the entire
real line, and both functions are continuous in a neighborhood of q,.
Further suppose
> 0 and all real
x,
Then n(1-,)/2( q,n - ¢)
g
Hog(2) is continuous in a neighborhood of q, and that for some d
Hog(r+l)(x) = Hog(r+l)(q,) + O( I x _ q, Id ).
Theorem a...3..:.
2
N (JJ2' 0'2)
Assume AI-A3, A9', and AlO hold.
where
- 67 -
=-
J.&2
r
(r+l)
2 C 1 A opt Copt Hog
(<p)
g( <p) [ 1+; ]
PI
and
2
2
-1
(1)
2 _ C 1 A opt Copt Hog (<p) /3 2
0'2 [g(<p) ]2 [ 1+; ]
with /3 1 =
~
r.
+1
Jur K(u)du
and /3 2 =
J+l [
-1
K(u) ]2 du .
-1
Proof: Once again the proof is an application of the Fabian result. By definition
For the algorithm (3.6), as in the proof of Theorem 2.2 with Un = <Pn -<p,
U n+l =
1 2
d D ( )1
U n [ 1 - A nn -1 dt
g t t='1n ] - A n n -1 Cn / n ;/2 y n - Ann -1 ~n
= Un [ 1 - n
-1
rn
]+ n
(~-1)
~n
yn
+ n
-1 -(1-;)/2
n
Tn
where
1,
and
The indicated limits hold by Lemmas 3.5 and 3.8. Take
- 68 -
0'
= 1 and /3 = 1-;. Then
-(a+/3)/2~ V
U n +1 -- U n [1 - n -ar]
n - n
'¥n n
+ n -a-(/3/2)T n·
Let
-1 r
(r+l)
= - C 1 A opt [ g(4)) ] Copt Hog
(4)) /3 1
by Lemma 3.5 and the proof of Corollary 3.1.
By definition E V n
CJn
= O.
Lemmas 3.1 and 3.2 hold with Cn
= Cnn -'"'(
and
thus by slight modifications to the proof of Lemma 3.3,
To see that E
CJn
V; is bounded, note that by Lemma 2.4 and Lemma 3.2 (a),
Thus there exists a constant K such that K
> I ECJn V; - E I -+ O.
It remains to be shown that for each r = 1,2,3, ...
By Lemmas 2.4 and 3.2 (a), for t
~
lim E{V; x[ V;~rn]}=O.
n-+oo
1
(3.9)
= o( 1).
Take 1 ~ q ~ 3 and p ~ 3 such that 2/p
+ l/q =
- 69 -
1. Using (3.9), it can be shown (see the
e·
proof of Lemma 2.7, Frees(1983)) that
E{ V~ X[ V~~rn]~
o( 1)
l/q ]p/2
[n
Cn
o( 1)
= o( 1)
since b +"'(
o
< 1/3. This completes the proof of the theorem.
3.5 A Fixed-Width Confidence Interval For The Optimal BRP
This section concerns itself with opimal stopping for the stochastic approximation
algorithm of Section 3.2.
The specific objective is a stopping time for fixed-width
confidence interval estimation of <p.
The principal results given here are an application
of the work of Frees (1985) on fixed-width
approximation procedures.
confidence intervals for stochastic
(See also Frees (1983).)
As in Sections 3.3 and 3.4, it is
assumed throughout that the kernel estimator hg n defined by (3.1) with p
=0 is used
to
estimate Hog(l) in the procedure given by (2.2) and (2.3). The constants A and C will
once again be taken as fixed.
Frees (1985) studied the following general stochastic approximation procedure.
Let f: R..... R be a measurable function such that 0 is the unique root of f(x) = O. Suppose
(3n and (n are random variables. Define successive estimates of 0 recursively by
(3.10)
X n +1 = X n - n
-1
{f(X n )
+ n T-1/2 (3n + n T(n
- 70 -
}
e
where the constant
T
is such that
°
~ T ~
1/2. As noted by Frees, the random variables
/3 n and fn may be interpreted as the (standardized) bias and error that arise fr0111
estimating f(X n ).
Using the sequence of estimates {X n } derived from the algorithm (3.10),
Frees(1985) defines a stochastic process W n on D[O, 00), the space of all functions
defined on [0, 00) having left-hand limits and continuous from the right.
Weak
convergence properties of this stochastic process are explored to more fully characterize
weak convergence of the sequence {X n }. For appropriate sequences of positive random
variables {N n} the same weak convergence is shown to hold for the randomized version
W N • This in turn is used to show that for certain sequences of stopping times {N n }
n
the randomly stopped estimator X Nn is asymptotically normal.
Frees then gives a
stopping time and confidence interval for ().
Let 1 - cr, cr E (0, ~), be the desired asymptotic coverage probability.
We will
need to refer to Frees' assumptions and some results, quoted here for convenience.
Fl: Let '1 >
°and G> 1/2 -
F2: Let p >
°
F3:
and
Assume that f(x)
T.
/3 E R. Assume that /3 n = /3
= G( x -
+ o(
n
-p
())
+ O(
1x
1
_ () 1
+1\
).
lim X n = () a.s.
n-+oo
F4: There exists a standard Brownian motion B on [0, 00) and constants
(1,
f
> 0 such
that
~
LJ
L
-
~k-
(1
B(t)
+
1 2
O( t / -
f
) •
k~t
F5:
Let Nn/m n = N
+
op(I), where N is a positive random variable, {mn} are
integers going to infinity, and {N n} is a sequence of random variables.
- 71 -
t1ri = t1
F6: Let {t1~}, {O'n}, and {G n } be sequences of random variables such that
0(1), O'n =
0'
+ 0(1), and G n
=G +
+
0(1).
Theorem(Frees):
Consider the algorithm defined by (3.10) and assume F1-F4. Let
(3.11)
Wn(t)
= [nt]
1/2-r
t1
( X[nt]+l - 0) + (G-1/2+r)' n
where in this expression [.] denotes the greatest integer function.
standard Brownian motion B on [0,
00 )
= 1,2, ...
Then there exists a
such that
Wn(t)
D
-+
Z(t) ,
where
Z(t) = { 2(G-1/2+r) } -1/2
0' t
-(G-1/2+r) B(t 2(G-1/2+r» .
Further assume that F5 holds. Then
o
Denote the (l-O')-th quantile of the standard normal distribution by
Zo"
Define
the stopping time N d for each d > 0 by
inf{ n~l: d ~
zO'/20'n
n
-(1/2-r)
-1/,) }
[2(G n -1/2+r)]-
(3.12)
00
if no such n exists,
where the random variables O'n, and G n are described in F6.
(3.13)
Let
_
-(1/2-r) (
t1~
')
-1/2 )
In - [ Xn+n
(G _1/2+r)-ZO'/20'n[2(G n - 1 / - + T ) ] ,
n
- 72 -
Xn
+n
-(1/2-r) (
be a confidence interval for
P~
(G n -1/2+r)
-1/2 )
+ zo/2O'n[ 2(G n -1/2+r) 1
(J.
Corollary (Frees): For the algorithm defined by (3.10) assume FI-F4 and F6. Then
o
lim P«(J E IN ) = 1-0.
d~
d
Define the following quantities:
(3.14)
f(x) = A Dg(x),
G =
Ad~Dg(x)lx=4>
Pn = C r AC 1 [ g(4)n) ]-l c; r ECJn[ hg n (4)n) - Hog(l)(4>n)],
P = C r AC [ g(4)) ]-1 Hog(r+1)(4»
1
~
+1
JurK(u) du,
-1
in
= n
-r
A[ Dg n (4)n) - E Dg n (4)n)] = A C
CJn
-1/2
V n,
r = 1/2, and" = 1/2 - r.
Lemma 3.9 gives sufficient conditions for F4 to hold.
The proof relies on the
following theorem due to Strassen(1967), given here as it appears in Sen(1981).
Theorem(Strassen)j
Let {X n , ~n }~=1 be a martingale and X o = EX 1· Define
xh
X n - X n -1 for n ~ 1 so that {X~, ~n } ~= 1 is a martingale difference sequence. Let
Yn
= Ln
k=1
ECJ
k-l
, 2
(X k ), n ~ 1,
where it is assumed that ECJnC X~)2 exists a.s. for every n ~ 1.
process S = { Set); t
~
Define the stochastic
0 } by S(O) = X o = 0, S(Yn) = X n for n ~ 1, and
- 73 -
S(t)=S(Y n )+
Let f =
{f(t); t
> o}
[
y
t - _Y
Yn
n+l
n
] X' +
n 1
forYn::5t::5Yn+l,n~l.
be a nonnegative, non decreasing function such that f( t)
increasing and t -If(t) is decreasing in t
> O.
Assume Y n ....
00
IS
a.s. as n .... 00 and
(3.15)
Then there exists a Brownian motion process W = { Wet); t
~
0 } such that
Set) = Wet) + o( (log t) [tf(t)]1/4) a.s. as t .... oo.
Lemma ~ Assume AI-A3, A6 and A9 hold with 1 = 2r~1
In
0
A6 (for the r of A9).
Then F4 holds with
(J'
where
Proof;
(J'I
=
{(J'I
[2f - ( 1-1)] }1/2,
appears in Corollary 3.1.
By definition, {In, c:J' n+l} ~1 is a martingale difference sequence.
Under the
conditions of Corollary 3.1, by the limit indicated there for Ec:J'n V~,
Thus, as required by the Strassen result,
lim Yn = n....oo
lim '"
Ea l2k =
n....oo
L..,
~k
00.
k::5n
Let f be a nonnegative, nondecreasing function.
holds, we first note that for p > 2,
- 74 -
To show that condition (3.15)
[f(Y n )]
-1
2
2
-(p-1)
{2
p-2
2
}
E'!fn{'-nX['-n > f(Y n )]} = [f(Y n )]
E'!fn '-n[f(Y n )]
X['-n>f(Y n )]
~
[f(y n )]-(p-1) E'!fn
I '-niP
From the proof of Theorem 3.1 (b), also using Lemma 3.2 (a),
p/2
p
p
(p-1h
Cn
E'!fn l Vnl ~ cn { K 1 + K 2 n .
+ K 3 }.
It follows that
Thus, taking '- > 0 and f(t) = t 1 -,-,
f:
n=1
[f(y n )]- 1 E'!f { '-5 x['-5>f(Y n )]}
n
=
f:
n=1
f:
~
n=1
K [f(y n )]-(P-1)n(P/2-1 h
4
K [ (Yn/n)1-,- n 1 -,-]-(p-1)n (p/2-1h
4
~ K
-(1-,-)(p-1) (p/2-1h
~ LJ
5n
n
n=1
This last summation is convergent for sufficiently small '- and large p. The remainder of
the proof is identical to the latter portion of the proof of Lemma 5.4 of Frees(19S3),
applying the Strassen result to show that there exists a Brownian motion such that F4
holds.
o
In Lemma 3.10 the assumptions F1 and F3 are shown to hold under conditions
similar to those of Corollary 3.1. Assumption A9' is first strengthened:
"
A9:
.
For some mteger
r~2,
Hog
bounded on the entire real line.
continuous in a neighborhood of <p.
(1)
,Hog
(2)
,Hog
(3)
Furthermore, Hog
,and Hog
(1)
,Hog
(r+1).
eXIst and are
(2)
,and Hog
(r+1)
are
Also assume that for some d > 0 and all real x,
- 75 -
e·
Lemma a.l!l:. Assume A1-A3, A6, and A9" hold. Then F1 and F3 also hold with G =
AlxDg(x)lx=</I =
r.
Proof: B3 holds by Corollary 3.1 (a).
Expanding the function f(x) = A Dg(x) about </I yields
for some </I', I </I' - </I 1:$ I x - </I I.
0
Lemma 3.11 verifies condition F2.
Lemma
a...u:.
If A1-A3, A6, and A9" hold, then F2 holds.
Proof: By definition and the proof of Lemma 3.1 (b),
f3 n - f3 =
- [g(</I)]-1 Hog(r+1)(</I)
~
+1
Ju r K(u) dU}
-1
= K 1 [g(</In)]-l
~
+1
Jur K(u) [Hog(r+1)(71(U»
- Hog(r+1)( </I)] du
-1
where
1
71(U) - </In
I
:$ CnU. Thus by A3, A9", and the asymptotic normality of <Pn,
- 76 -
+1
= K 1 [g(l/>n)]-l ~ Ju r K(u) [ Hog(r+1)( 7](u))
f3n - f3
- Hog(r+1)( I/> ) ] du
-1
+
K 1{ [g(l/>n)]-l- [g(I/»]-l} Hog(r+1)(I/»
~
+1
Ju r K(u) du
-1
= D(
I 7]( u) -
I/>
Id
) + D(
I I/>n
- I/>
I)
= D( [ D( n -rr + cn) ]d ) + D( I I/>n - I/> I )
= D(
n -d-y )
+
D( n -r-y )
o
which establishes the lemma.
Theorem 3.4:
Assume A1-A3, A6, and A9" hold.
W n be defined by (3.11), with f3, G, and
Ie
For t ~ 0 let the stochastic process
defined by (3.12).
Then there exists a
standard Brownian motion B such that
Wn(t)
= [nt] 1/2-T (X[nt]+l
D
- 0) - III -+ Z(t)
as n-+oo
where
The constant
Proof:
(T
has been defined in Lemma 3.9.
By (3.14),
G~1e
= Ill·
The theorem follows immmediately from the theorem
due to Frees(1985) since the assumptions have been shown to hold in Lemmas 3.9, 3.10,
and 3.11.
o
Define in = n -1
f:
j=l
Let an, An, and 'It n be as defined in Section
[g( I/>n) ]-1.
3.4. Also, define as estimates of G, f3, and
(T
the random quantities
- 77 -
e·
and
O'n
J1 /2,
= C 1 A Tn [ C -1 f3 2 l)n
respectively.
Corollary ~ Assume A1-A3, A6, and A9" hold. For d
by (3.13). Then lim P( t/J E IN )
d-+O
Proof:
d
> 0, define Nd by (3.12) and In
= I-a.
By small changes to the proof of Lemma 3.5 it can be seen that G n -+G a.s.
Since Corollary 3.1 holds, Tn-+[g(t/J)]-l a.s.
larly, by Lemma 3.6,
O'n -+0'
Thus f3~-+f3 a.s. by Lemma 3.7.
Simi-
a.s. Thus assumption F6 of Frees is satisfied. Since FI-F4
hold by Lemmas 3.9, 3.10, and 3.11, the result holds by the Frees corollary.
- 78 -
o
CHAPTER 4
RENEWAL FUNCTION ESTIMATION AT A POINT
4.1
Introduction
Let X, Xl' X 2 , ... be independent and identically distributed random variables
(k)
having distribution function F.
Put Sk
Xl +... + X k and denote by F
(t)
=
P(Sk
~
=
t) the k-fold convolution of F. The renewal function H can then be defined by
H(t)
(4.1)
=L
F(k)(t),
-00
<
t
<
00.
k~l
In Section 2 we discuss moment conditions on F sufficient for the finiteness of H( t),
-00
< t < 00.
If F is a lifetime distribution (Le., F(O-) = 0), then H(t) is the expectation
of the renewal random variable N(t)
= max{k: Sk ~t}.
expected number of partial sums Sk such that Sk
~
In general, H(t) is the
t.
Motivated by (4.1), Frees (1986ii) considered estimating H(t) for t
mating a finite number of convolutions of F. For 1 ~
k~
~
0 by first esti-
n, let X c1 ,,,,,X ck be a subset
of k observations from the random sample Xl' ... ,X n . Denote by
t the summation over
all (k) such distinct subsets of k observations. A natural estimate of F(k)(t) is the Ustatistic:
Let m
= m(n)
and m(n) -
be a monotone increasing sequence of positive integers such that m
00
as n -
00.
Define the renewal function estimator Hn(t) by
~
n
m
L
(4.2)
k=l
Let X-
= mineO,
X).
Frees showed that Hn(t)
(4.3)
-+
(k)
F n (t).
e·
Assuming F has positive mean I-l and finite variance
(12,
H(t) a.s. if either
for some r ~ 3, EIX-I r
<
00
and n
=O(m r - 2 ),
or
(4.4)
for some 0 1 > 0 and all
101 < 01'
Eexp( -OX-)
< 00
and log n
= oem).
The assumptions about the growth of the parameter m are such that the bias goes to
zero at a fast enough rate for a.s. convergence.
Frees also establishes that when
m(n) = n, this convergence is uniform over bounded subsets of nonnegative real
numbers: for fixed T
>0
sup IHn(t) - H(t)1
tE[O,T]
-+
0 a.s.
Under conditions similar to (4.3) and (4.4), it is proven that n
totically normal with the limiting variance
weakly consistent estimator of
1 2
/ [H n (t)
(I: dependent on t and F.
- H(t)] is asympFrees provides a
(I:, allowing the formulation of large sample confidence
intervals for H(t). In Section 2 of this chapter weaker sufficient conditions for the consistency and asymptotic normality of Hn(t) are given.
Large sample behavior of Hn(t) for a single value of t is further characterized in
Section 3 by an in variance principle useful for studying sequential versions of Hn(t). In
(I; are given. One of
for (I;; the other is a
Section 4 two consistent estimators of the asymptotic variance
these, considered by Frees, relies on an explicit expression
jackknife estimator.
A stopping time for a fixed-width confidence interval for H(t)
- 80-
IS
defined in Section 5. Asymptotic properties of the stopping time (as the desired interval
width goes to zero) are then explored.
4.2 A Correction And Some Generalizations
A small error occurring in Frees (1986ii) is corrected before proceeding to note
how some of Frees' results hold under more general conditions than given in that paper.
Theorem 2.1 of Frees (1986ii) gives sufficient conditions for
(4.5)
lim
n-+oo
where a E Rand ga: R+
-+
R is such that
Setting a = 0 and ga(u) = x(u
H(t) - H(O-) a.s.
~ t)for some t E R+, if (4.5) holds, then Hn(t) - Hn(O-)
-+
This does not imply the consistency of Hn(t) when H(O-) > O.
Instead, let ga: R -+ R be a function such that
The trivial remedy is then to replace integration over the half-line [0,00) by integration
over the whole real line in Frees' Theorems 2.1 and 2.2.
The two results quoted here due to Heyde (1964) bear directly on the question of
- 81-
when H(t) is finite. They will be used to indicate how Frees' estimator is more generally
applicable.
Lemma (Heyde):
EIX-I r <
00
Let
EIXI < 00
and
EX> 0,
or else
EIXI =
00
e·
and, in either case,
for some integer r ~ 1. Then
Lkr-2p(Sk $x)
<
00,
<
-00
x
<
00.
k~l
Theorem (Heyde):
EIXI < 00, EX> 0,
Suppose
and let r be a nonnegative integer.
A
necessary and sufficient condition for the convergence of the series
L k r P( Sk $ x),
<
-00
x
<
00,
k~l
is that
EIX-l r + 2 < 00.
Thus if
EIX-I < EIX+I
be finite is that
condition that
EIX-1 2 < 00.
EIX-1 2 < 00
$
00,
then by the lemma a sufficient condition for H(t) to
By the theorem we see that when 0
< EX <
00,
the
is also necessary.
The remainder of this section describes
how the sufficient conditions for
consistency and asymptotic normality of Hn(t) given by Frees can be weakened.
Frees
assumed throughout his paper that F has positive mean JJ and finite variance 0'2.
Note
that (4.3) and (4.4) both imply that
(4.3)
holds, then
o $ EIX-I < EIX+,
Frees'
$
00,
EIX-1 2 < 00
and thus that H(t)
proof of the consistency of Hn(t)
< 00
for all t E IR. If
requires only
that
the proof not relying on a finite mean JJ or finite variance 0'2.
Assuming (4.4), Frees uses the condition 0
(4.6) there exists a 8 2 > 0 and 0
<
P
<
<
JJ
<
00
to show that
1 such that F
- 82-
(k)
(t) $ exp(8 2 t)p
k
for all k 2: 1,
e
for all t E R.
The remainder of the consistency proof assuming (4.4) relies solely on the inequality
(4.6). Thus, in this case, the assumption of a finite variance can be dropped.
If F(O-) = 0 and F(O) < 1, then it is easily shown that (4.6) holds with ()2 = 1,
even when
EX =
00.
Taking log n = o(m), the consistency of Hn(t) follows by the same
arguments used by Frees when (4.4) holds and IJ <
00.
Theorem 2.2 of Frees is now restated with the weaker conditions suggested by the
preceding arguments. With the previously mentioned corrections to Section 2 of Frees'
paper, it is not necessary that t
~
O.
Theorem A (Frees): Assume
(4.7)
0
~ EIX-I
<
EIX+I ~
00
and (4.3) holds,
or
(4.8)
F has finite mean IJ > 0 and (4.4) holds,
or
(4.9)
F(O-)=O, F(O)<1,andlogn=0(m).
Then for each t E R, Hn(t) -
H(t) a.s. as n ....
00.
If F is continuous, it is not essential that m(n) = n for there to be uniform
convergence on finite intervals.
Theorem:
Assume F is continuous.
Then for fixed T
Further suppose that (4.7) or (4.8) or (4.9) holds.
>0
- 83-
sup IHn(t) - H(t)1
tE[O,T]
Proof:
e·
0 a.s.
-+
If F is continuous, then H is also continuous.
Thus {H n } is a sequence of
monotone increasing functions converging pointwise to a continuous monotone increasing
function on the interval [0, T] by Theorem A.
uniform on the interval.
It follows that the convergence is
0
Frees' result on the asymptotic normality of n
1 2
/ [H n (t)
- H(t)] also holds under
somewhat weaker conditions. These less stringent assumptions are identified separately
here, since they will be used throughout Chapter 5 and 6.
B2:
F has finite mean Jl>O.
For some 01>0 and all
1°1<°1'
Eexp(-8X-)<oo and
log n =o(m).
B3:
F(O-)
= 0, F(O) < 1, and log n =o(m).
Assumptions B2 and B3 are identical to (4.8) and (4.9).
A modified version of Theorem 3.1 of Frees is now given. Again, the result can be
extended to include the case when t is negative.
Let X have distribution function F and
{rst(c) = Cov{F
for integer r,s
~ 1 and
(r-c)
l:::;c:::;rl\s. Define
(t - X), F
0'; =
- 84-
~
r,s=l
(s-c)
(t - X)}
rS{rst(l).
Theorem B (Frees): Assume Bl or B2 or B3 holds. Then for each t E R,
A careful reading of the proof of Frees' Theorem 3.1 shows that the weaker
moment conditions of Bl and B2 still allow the indicated result. That B3 is a sufficient
condition follows from Frees' proof by using the fact that B3 implies (4.6).
(See also
Section 5.3, Theorem 5.3 for a proof of this result.)
4.3
An Invariance Principle
For a fixed value of t, define the stochastic process Y n = {Yn(s), s E [0,1]} by
Yn(s) = 0,
0 5 s
Yn{s) = Yn{k/n),
where
Y n{k/n) -
<
m(n)/n,
kin
5 s < (k+1)/n, k = m{n), ... ,n-1,
k[Hk{t)-H{t)]
n
1/2
and
O't
Let D[O,I] be the space of all functions on [0, 1] that are right continuous and have lefthand limits. Then Y n E D[O,I] for all n. It will be shown that Y n converges weakly to a
standard Brownian motion in the J 1-topology on D[O,l].
For the results of this section we will need a slightly stronger condition than B1.
Thus let B1' be as follows.
B1':
F is such that 0
5 EIX-I < EIX+I 5
n = O(m 2r - 4 ).
- 85-
00.
For some r
~ 7, EIX-I r < 00 and
We first consider a process closely related to Yn.
Let Y~ = {Y~(s), sEra, I]}
be defined by
°~
Y~(s)
= a,
Y~(s)
= Y~(k/n),
s
< m(n)/n,
kin ~ s
E;:(~) F(j)(t)]
k[ Hk(t) where
=
n
n 1 / 2 [H n (t) and
< (k+l)/n, k
1/2
= m(n), ...,n-1,
k = m(n), ... , n-l,
ITt
E;:::)
F(j)(t)]
ITt
Lemma 4.1 states the asymptotic equivalence of Y n and Y~.
Lemma 4.1: Assume Bl or B2 or B3 holds. Then
sup IY~(s) - Yn(s)1 SE[O,l]
a
as n -
00.
Proof: Note that
sup IY~(s) - Y n(s)1
sE[a,l]
=
<
by Lemma 3.1 of Frees (1986ii).
max
m(n)sksn
1
ITt
0
asn-oo
0
- 86-
I:
j>m(k)
F(j)(t)
e·
In defining several other stochastic processes, use is made of a standard
decomposition for V-statistics due to Hoeffding. For integer
e~ 1
let
and
Also, define the kernel
=
where Se,d,h
d
h
L
(-1) Se d h'
h=O
' ,
= fge,(d-h)(xjI"",xj(d-h»' the summation being over all (d~h) subsets
of d -h elements drawn from {xl',,,,xd}' Then
F~e)(t) may
be written as
where
f in this instance denoting su mmation over all su bsets of d observations from XI"'" X k '
A standard property of the (e ,d is that
o $ (e,d
$
(e,e
for 0 $ d $
E[gO
e,d
e.
Another fact is that
(X'I' ""X'd)gO (X· I , ... ,X· d )] =
I
It,d
J
J
- 87-
0
e·
Another property of the decomposition given for
F~e)(t)
is that for 1
~
d < d'
~
e,
E[ F~e~ F( e) ,] = O. Hence, for future reference
,
k,d
(4.10)
where the constant K I is independent of k, e, and d.
We now define a projection of
{y~I)(8), 8 E [0,
Y~ in terms of i.i.d. random variables. Let y~l) =
I]} with
y~I)(s) = 0,
0
~
s
<
I/n,
y~1)(s) = y~I)(k/n),
where
~
kin
m(k)
(j)
k E. 1 j Fk 1
J=
'
n
1/2
s
=
O't
k=I, ... ,n.
- 88-
<
(k+I)/n, 1
~
k
~
n-I,
(2)
(2)
Next, let Y n = {Y n (s), se[O, I]} be defined by
y~2)(s) =
° $ s < lin,
0,
y~2)(s) = y~2)(k/n),
kin $ s
men) (j)
k Ej =l Fk,l
where
n
1/2
<
(k+1)/n, 1 $ k $ n-1,
=
(Tt
k=l, ... ,n.
We first show that under appropriate conditions
Brownian motion in 0[0,1]. Using
y~l),
0, implying by Lemma 3.1 that Y n
~
Lemma 4.2:
y~2) ~
B, where B is a standard
sup ly~2)(s)_Y~(s)1 1:
se[O,l]
it is then shown that
B.
Assume B1 or B2 or B3 holds.
Then
y~2) ~
B as
n ..... 00 In
topology on 0[0,1], where B is a standard Brownian motion.
Proof: Define the random variables Zni by
i
= 1, ... ,n,
and
°
k
Let S n k = E.1= Z nl. and GJ n k= GJ k = (T(X 1 , ... , X k ), k = O,l, ... ,n. We note that
E«
~ n(i-l)
Z . = 0,
m
i
= 1, ... , n,
- 89-
and
E
Z2
':f n(i-I) ni
-
~
n
0' t
e·
men)
L
rS~rst(I), i = 1, ... ,n.
r,s=1
Thus, for any fixed s E [0, 1] and associated sequence of integers {k n } 00
such that
n=1
kn/n ~ s < (k n 1)/n for all n, we have that
+
kn
L E':f n(i-I) Z2.m
i=1
=
men)
n
--2
k
"LJ rs ~rst(l) n O't r,s=1
as n -
s
00.
Also, by Lemma 3.2 of Frees (1986ii) or Lemma 5.3 of this work, the Lindeberg
condition
as n-oo
is satisfied.
By a theorem of McLeish (1974) (see Sen(1981), Theorem 2.4.2),
y~2)
converges weakly (in the J I-topology on D[O,I]) to a standard Brownian motion in
C[0,1].
0
Lemma 4.3: Assume Bl ' or B2 or B3 holds. Then for every
(>
°
as n -
00.
Proof: Let
k
77 n k
L
i=1
and ':f k = O'(Xl' ... ,X k ). Define m- 1(e) = inf{k: m(k)~e}. Then for e = 1, ... ,m(n),
l
{77 k; ':f }m- (e+l)-1 is a zero mean martingale. Thus
n
k k=m -lee)
- 90-
e
<
m(n)-l
m- 1 (e+1) m(n)
2 2
rs le rst (1)I,
e=m(m(n)) n ITt l
r,s=e+1
L
L
by the Kolmogorov inequality for submartingales.
Note that
e ()I
F(rvs)(t),
I"rst
1 ~
(4.11)
where rVs is the maximum of rand s. Also,
(4.12)
B2 and B3 both imply by (4.6) that for any fixed a E R
for each t E R.
Thus, since m- 1(m(n)) ~ n,
- 91-
m(n)-1
L
l=m(m(n))
m- 1 (l+l)
2
n ITt
1
< IT 2
as n -
00
l
L
r,s=l+1
rs le rst (I)1
f:
00
l2
t
< K1
m(n)
2
L
l=m(m(n))
f:
e·
rs F(rvs)(t)
r,s=l+1
r 4 F(r)(t) -
0
r=m(m(n))
by (4.12) if B2 or B3 holds and by the Heyde lemma if BI' holds.
0
completes the proof.
Lemma 1d:
This
Assume Bl' or B2 or B3 holds.
If [m(n)]2 log n=o(n), then for every
p{m(n)::;k::;n
max
ly~I)(k/n) - Y~(k/n)1 > l} -
0
Proof: Using the previously defined decomposition of
as n -
00.
F~l)(t), we
have that
p{m(n)::;k::;n
max
ly~I)(k/n) - Y~(k/n)1 > l}
l
(l)
- p{m(n)::;k::;n
max
kl L
L (~) F k ' d I> n
l=2 d=2
m(k)
l
m(n)
<
L p{m -l(max
l=2
l)::;k::;n
kl
1 2
/ l
ITt}
L (l) F(l)
I> n m(n)ITt}.
d=2 d
k,d
1/2
l
Let Xk:I,Xk:2"",Xk:k be the order statistics of the random sample of XI, ... ,X k "
Define
ek
= IT{ X k : I ,X k :2 , ... ,X k : k ; X k + I , ... ), k = 1,2, ... , the usual symmetric a-fields
- 92-
associated with Xl'X 2 '.... Since the
F~~~
are V-statistics for k
~ d, {F~~~, ~k }k~d
a reverse martingale. Letting
m -1( t) :5 k
<
00,
t = 2,3, ... ,
it follows that {N tk' ~k} 00
-1
is a reverse martingale for each t
k=m (t)
= 2,3, ....
An upper bound for E N~k is now derived. By (4.10)
< K t 4 F(t)(t) ~ [(t-2)!f(k-d)!
-
d~2 [( t-d)! ]2(k-2)! .
2 k2
Because [m(n)]2 log n = o(n), ultimately [m(n)]2:5 n. Thus for large enough
and
Hence there exists a constant K 3 such that
- 93-
e,
is
eUsing the Kolmogorov inequality for reverse submartingales,
<
since
~ e5 F(e)(t)
e=2
<
establishes the lemma.
00
by either Bl ' and the Heyde lemma or (4.12).
This
o
Lemma 1&: Assume Bl ' or B2 or B3 holds. Ifm(n)=o(n), then for every (>0
p{ O~k<m(n)
max
ly~2)(k/n)1 > (}
Proof:
By definition, for fixed n,
-
0
as n -
00.
{y~2)(k/n)j ~k }m(n)
k=1
the Kolmogorov inequality for submartingales,
- 94-
is a martingale.
Thus, using
p{ O~k<m(n)
max
ly~2)(k/n)1 >
l
}
< m(n)
n l2 CT 2
t
as n -
L
rS{rst(1) r,s=l .
0
o
00.
Theorem
m(n)
1...l. Assume B1' or B2 or B3 holds.
If [m(n)]2 log n=o(n), then Y n
~ B in
the J 1-topology on 0[0,1], where B is a standard Brownian motion.
Proof: For any
>0
l
~ pI
ly~2)(k/n)1 >
l} + p{m(n)~k~n
max IY n(k/n) - y(2\k/n)1 > l}
n
max
ly~2)(k/n)1 >
lo~k<m(n)
l} + p{m(n)~k~n
max IY n(k/n) - Y~(k/n)1 > l/3}
max
lo~k<m(n)
~ pI
+ p{
-
0
max
m(n)~k~n
IY~(k/n) - y~l)(k/n)1 > l/3}
asn-oo
by Lemma 4.1, 4.3, 4.4, and 4.5. Thus Y n and
y~2)
have the same limiting distribu-
tion, which is the distribution of a standard Brownian motion by Lemma 4.2.
- 95-
0
4.4
Estimating The Variance
e·
By Theorem B of Section 4.2, for a large class of distribution functions F,
n
1 2
/ [H n (t)-H(t)]
has asymptotic variance
Cov{ F(r-1)(t_X), F(s-l)(t_X)}.
0';
=
~
r,s=l
rserst(1), where e rst (l) =
Frees (1986ii) defined an estimate of
0';
motivated
explicitly by this definition; this estimator was shown to be weakly consistent with the
goal of constructing a large sample confidence interval for H(t).
In Section 4.4.1
sufficient conditions are given for Frees' estimator to be strongly consistent. A jackknife
estimator considered in Section 4.4.2 also converges with probability one.
Strong
consistency is of importance here because subsequently these estimators will be used in
defining stopping times for estimation of H(t).
Chapter 5 is concerned with estimating H( t) by H n (t) simultaneously for all
t E [O,T], where 0
<
T
<
00.
With this problem in mind, let W be a monotone
increasing function of bounded variation on· the interval [0, T] and define 1* =
JT O'~ dW( u).
o
1* is then a (weighted) measue of expected error in estimating H on the
interval [0, T].
For an application in Chapter 5 an estimate of 1* will be necessary.
Thus this section treats the slightly more general problem of estimating 1*.
4.4.1 Frees' variance estimator
For
b(t) =
~
r,s~l
let erst = E{F(r-1)(t_X),F(s-1)(t_X)},
r F(r)(t), so that
r,s=l
0'; = a(t) Define the kernel
- 96-
[b(t)]2.
a(t)
00
E rserst' and
r,s=l
and let X il ,,,,,Xi(r+s_l) represent any permutation of r+s -1 observations drawn
from
the
random
[n-(r~s-I)]!
sample Xl' ... ,X n .
E
Let
p
denote
the
summation
over
all
such permutations. Then
is an unbiased estimate of erst.
Let m.
= m.(n)
positive integers such that m.(n) - 00 as n m.
(r)
b n (t) = E I' F n (t). Frees considered
1'=1
be a monotone increasing sequence of
00.
Define an(t) =
m.
~ rsenrst
r,s-1
and
(4.13)
as an estimator of (Tt2 and gave sufficient conditions for (T2
nt
to be weakly consistent.
Now let
In -
T 2
0 (Tnt dW(t)
J
be an estimator of I.. In Theorem 4.2 sufficient conditions are given for I n to converge
to I. with probability one.
The following assumptions will be used.
p
EIX-I < 00
(p-5)
and n = O(m.
).
B4:
For some p;::: 6,
B5:
F has finite positive mean JI. For some 01 >0 and all
and log n
B6:
= o(m.).
F(O-) = 0, F(O)
<
1, and log n = o(m.).
- 97-
101 <01'
E exp(-8X-) <00,
Theorem~:
Suppose B4 or B5 or B6 holds. Then I n -. 1* a.s.
order statistics. It is first shown that an{t)
~
aCt) a.s. for fixed t E [0, T]. Define
00
L:
an{t)
r,s=1
For
all
n~
rSE en { grs{t; Xl' ... ,X r + s _ l ) }
is
I,
measurable.
Een{grs{t; XI"",Xr+s_I)}=enrst.
fields, E
en an_I =
If
r+s-I:=;n,
then
Since {en} is a decreasing sequence of sigma-
an' Hence {an, e n }:=1 is a reverse martingale and con verges to its
expectation a{ t) a.s.
Let
It remains to be shown that anI ~
°
a.s.
Let m;I{k) = inf{n: m*(n) ~ k}.
fact that for any t E [0, T],
it follows by the Markov inequality that for any
L:
n~I
P{lanI{t)I>(}
< (-1
L:
n~I
- 98-
(> 0,
Using the
L
rs F(rVs)(t)
rvs>m*
< K
I
L
L
n~I
s=m*(n)+1
s3 F(rVs)(t)
If B4 holds, then
In this case
by the Heyde lemma.
With m- 1(k) = inf{n: m(n)=k}, Frees (1986ii) noted that if B2 or B3 holds,
then m -I(n) $ exp(6 -1
m
(n)
n), where 6n -
the existence of a sequence 6 n -
0 as n -
00.
Similarly, B5 and B6 imply
0 such that m;I(n) $ exp(6
-1
m'l" (n)
n). Since both 85
.
(k),
k
and B6 ensure there exists K > 0 and p, 0 < P < 1, such that F
(t) $ K p for all k:2: 1,
we have that
L
n~I
P{la n1 (t)1 > d
=
<
00
K4
L
k 3 exp(k[6
k=l
m.
00.
- 99-
+logp])
-1
(k)
Thus by the Borel-Cantelli lemma anI (t) - 0 a.s. and an(t)
-+
aCt) a.s. when B4 or B5
or B6 holds.
By Theorem 2.1 of Frees(1986ii), bn(t) .... bet) a.s. if
<
This follows from the first part of the proof.
00.
E~=I[m;-l(k) -
Hence, O'~t
....
0';
k]kF(k)(t)
a.s. for each
tE[O, T].
Next, since an (t) is positive and monotone increasing,
-+
Thus
f:
an(t)dW(t) ....
aCT)
f:
f:
dW(t) a.s.
a(t)dW(t) a.s.
Similarly, since b n (t) is also positive and monotone increasing,
Hence,
completing the proof.
0
For future reference, note that if F is continuous, then the functions a and bare
both continuous on the interval [0, T]. Since both an and b n are monotone increasing
on [0, T] for all n
~
1, it follows that
- 100-
e
sup Ian(t) - a(t) I'" 0 a.s.
tE[O, T]
and
sup I bn(t) - b(t) I ... 0 a.s.
tE[O, T]
Thus
2 sup IlTnt
tE[O, T]
ITt2 I'" 0 a.s.
4.4.2 A jackknife estimator of variance
A jackknife estimator of
IT;
is now defined.
For i = 1, ... , n, k = 1, ... , m ( n),
compute the statistics
F(k) (t)
n(i)
where
ECi denotes summation over all
do not include Xi'
(n
kl)
distinct subsets of k observations which
(We will assume throughout that m(n):5 n -1 for all n.)
Let
m
(k)
Hn(i)(t) =E k =l F n(i)(t). Take
(4.14)
to be a jackknife estimator of
IT;.
In * =
As an estimate of 1* define
T *
J0 Vn(u)dW(u).
In Theorem 4.3 sufficient conditions are given for In * to converge to 1* with probability
one.
- 101-
Theorem
1.&: Assume B2 or B3 holds. Further suppose that men) ~ n -1 for all nand
there exists a positive integer no such that [m(n)]2 ~ n - men) for all n ~ no. Then 111*
Some preliminary definitions and lemmas precede the proof of Theorem 4.3.
*
Define H n - 1 (u)
= E km=l F (k)
n-1 (u).
V:(u) -
since E...
{H*n- leu) - Hn(u)}
vn
n(n-1)E
Then
en
[H:_ 1(u) - H n (u)]2
n(n-1) Varen {H *
n - 1 (u) - Hn(u)}
= O.
Using the decomposition of Hn(u) considered in
Section 4.3, define
'Y n 1(u)
and
Taking
= JT n(n-1) Ee
o
n
'Y~1 (u) dW(u)
and
we have that
(4.15)
In Lemmas 4.6 and 4.7 it is shown that I nh ..... 1* a.s. while I n2 *
- 102-
-+
0 a.s.
The
e·
proof of Theorem 4.3 follows.
Lemma 1&: Assume men)
~
n -1 for all n. Suppose B2 or B3 holds. Then I nh -- 1*
a.s.
Proof:
1=
1, ...,n, r,s = 1,2,3, .... Noting that
we can write
_
I nh -
n IT
(n-l)
m
{-I n
L
rs n .L Yirs(u) r,s=1
1=1
o
(r)
(s )
}
F n,1 (u) F n,1 (u) dW(u).
Next, define
Q
nl
-
n2
-
L
IT {
o
n-
m«rvs)~n
1
t
[Yirs(u) - ersu(I)]}dW(U)
i=1
and
Q
L
IT {
o
rs[Yirs(u) - ersu(I)]}dW(U).
(rvs»n
Then
Q
n =
n-
1
.t
1=1
t
JT {
rs[Yirs(u) - ersu(I)]}dW(U)
0 r,s=1
+
Q
n2
is a zero-mean reverse martingale and hence converges to zero a.s.
It is first shown that
Q
n1 -- 0 a.s.
Since
- 103-
Q
n1 is an average of zero-mean i.i.d.
random variables,
e·
<
K~
E[r,s=1
f: rs ITI0 Y lrs(u) IdW(U)]4 + K~n [r,s=1
f: rs ITI0 {rsu(1) IdW(U)]4
n
<
K2
n2
00
00
... L .1.!!4 risi IT ... IT E ..!!4
L
0
0
fl'sl =1 r4,s4=1 -1
I-I
+ K~
n
<
J
[00L I
r,s=1
IY 1rs (ui)ldW(ul) ..·dW(u 4 )
rs T I{rsu(l) IdW(u)
0
4 r.s.] F (max('l"l'oo"'4,s4)) (T)
... f: l.1=1
II
L
rl,sl=1 r4,s4=1
K2
n2
00
I I
+
K
n
f: "F('V')(T)T
i [r,s=1
[r
[r
r
e
dW(u)
0
dW(U)r
0
K
3
< -:"2'
n
by (4.12). Thus fOf any (>0, by the Markov inequality, E~=I P{lanll >
anI -
0 a.s. by the Borel-Cantelli Lemma.
To see that a n 2 -
0 a.s., note that
- 104-
d <
00
and
~
L:
[JTdw(u)l
J(rvs»n
o
2rs F(rvs)(T),
by (4.11). Again using the Markov inequality, for fixed
-
by (4.12).
and
Q
Thus Qn 2 -
K
2
L: s4 F(s)(T)
(> 0,
<
00
s~1
0 a.s., again using the Borel-Cantelli Lemma.
n 2 all converge to zero with probability one, it follows that
n- I
L:n
JT
i=1
m
L:
Yirs(u)dW(u) -
I. a.s.
0 r,s=1
The lemma will follow if it is shown that
m
(r)
(s)
rs F n 1( u) F n 1( u) dW ( u) r,s=1
'
,
L:
(r)
Since F n ,l(u) =
- 105-
0 a.s.
Since Qn, Qnl'
e·
By again applying the Markov inequality and the Borel-Cantelli Lemma, Ln -
0
which completes the proof.
Lemma
U:
0 a.s.,
Assume B2 or B3 holds. If men) = o(n
1/2
), then I n2 - 0 a.s .
.f.mQf: Using the notation defined in Section 4.3,
where the summation is over all
(d=D subsets of d-l
observations from X 1 ' ... 'X n _ 1"
Since
unless {il' ... ,i d }
= {jl' ...,jd}' for large enough n and all
4
<
U
E [0, Tj,
K r [
J2 [d]2
_1_
r 2(d-1)
L F ( r ) (T)
(n-1)2 (n-2)(n-3) ...(n-d)
d!
.
- 106-
Thus, since d
:5 r :5 m(n) and [m(n)]2 :5 n-m(n) (for large n) by assumption,
(:i)
4
[(r)
14
E F n-1,d(u)J
:5
Similarly, it can be shown that
for all u E [0, T]. Since B2 or B3 implies (4.6), for some p, 0
< P < 1,
<
[
By definition,
- 107-
d~l
]1/4
K
()
2
rp F ri (T)
4
n (n-1)2
By the Markov inequality, for every
0
the Borel-Cantelli lemma.
Proof of Theorem 4.3:
K
7
{> 0, P {I I n2 * I > {} ::; 22·
Thus I n2 * ---- 0 by
n {
By Holder's inequality and the conditional version of the
Cauchy-Schwarz inequality,
< 2n(n-1)E en { [
-
fo ;~l(u)dW(u) ]1/2[ fo ;~2(u)dW(u) ]1/2}
T
[ ]1/2[In2 *]1/2
2 I nh
T
----
0
a.s.
The theorem follows by (4.15) and Lemmas 4.1 and 4.2.
- 108-
0
e
4.5 A Fixed-Width Confidence Interval For The Optimal Block Replacement Policy
The objective of this section is a sequential scheme for fixed-width confidence
Let 1 - a be the desired coverage probability.
interval estimation of H(t).
d > 0, define the random interval In = [Hn(t) - d, Hn(t)
+ d].
For fixed
Ideally, one would like
to be able to select the minimum sample size nd such that
>
-
P{H(t)EI.}
nd
I-a.
Since this appears impossible in the present non parametric setting, the approach taken is
to define a stopping time Nd and show that under certain conditions
=
IimP{ H(t) E IN}
d!O
Let
T a/2
d
I-a.
be the (1- a/2)-th quantile of the standard normal distribution.
Take
nd to be the non-random integer such that
where no ~ 1 is an initial sample size.
0';.
Denote by s~ a strongly consistent estimator of
Define the stopping time Nd by
2
Nd
=
. {k : k >
mm
_ no an d sk
k ( T a /2 )2
:5 d 2 },
As is typically done, we will exploit the relationship between N d and nd to establish
asymptotic properties of H N (t).
d
For stopping times defined like Nd one always has that
- 109-
S2
Nd-l
>
(N d -1)d 2
2
(T 0/2)
>
(N d -1)
nd
>
e·
(N d -1) 2
s
Nd
Nd
and thus
s2
Nd-l
(4.16)
~
t
Since by definition N d
--+ 00
nd d 2
O'~( T0/2)2
u~( T 0/2)2
and nd
d2
~
(N d -1)
Nd
s2
Nd
~
t
Nd
as d 10, we have that -n
d
--+
1 a.s. as
d 10.
Theorem B of Section 4.2 gave sufficient conditions for the asymptotic normality
of Hn(t). Lemma 4.8 asserts that the randomly stopped estimator has the same limiting
distribution as d 1o.
LemmaU: Assume B1' or B2 or B3 holds. If [m(n)]2 log n=o(n), then
N
Proof;
used
Let Zn = n
1 2
/
d
[H
Nd
(t) - H(t)]
1 2
/ [H n (t)-H(t)].
to show that
Q N(O,
u 2 ).
t
The invariance principle of Theorem 4.1 can be
the sequence {Z n}
is uniformly continuous in
probability.
Anscombe's theorem then implies that Z N and Z"d have the same limiting distribu tion
d
as d 10, establishing the lemma. (See Woodroofe (1982), Section 1.3 for the definition of
a uniformly continuous in probability sequence and Anscombe's theorem.)
- 110-
0
For the remainder of this section let O'~t and V: (t) be defined by (4.13) and
If
(4.14), respectively.
s~ = O'~t and the conditions of Theorem 4.2 hold, or s~ = V:Ct)
and the conditions of Theorem 4.3 hold, then s~ ~
4.2 and 4.3 by taking W(u) = X( u
= O'~t
This follows from Theorems
t).
Assume Bl' or B2 or B3 holds and that [m(n)]2 10g n = o(n).
Theorem 4.4:
s~
~
0'; a.s.
and the conditions of Theorem 4.2 hold, or s~
= V~Ct) and
the conditions of
Theorem 4.3 hold, then
lim P{ H(t) E IN }
d!O
1 -
O'.
d
Proof: By Lemma 4.8 and the strong consistency of s~,
N
By (4.16),
1/2
d
d
~
-
T
0'/2' which implies that
d
<
-
1 - a
- 111-
N d1/2[ H N - H ( t )]
d
sN d
as d! O.
o
<
If either
N d1/2 d }
~
e·
CHAPTER 5
RENEWAL FUNCTION ESTIMATION ON AN INTERVAL
5.1 Introduction
This chapter considers simultaneously estimating the renewal function at all
points t in the interval [0, T], 0 < T
< 00,
using the estimator of Chapter 4.
In Section 5.2 the objective is minimum risk estimation, where the risk is a
function of the sample size n.
summands.
The risk function considered here is comprised of two
One term, essentially the asymptotic mean integrated square error,
decreases as the sample size increases. The other term is intended to reflect the cost of
the sample and consequently increases with the sample size.
The stopping time
proposed here is of the usual type for minimizing a risk function of this kind. We show
that in a large sample situation this stopping time can give some indication of whether
enough observations have been taken.
To help characterize the behaviour of the estimator when used over an interval,
1 2
the process {n / [H n (t)-H(t)], tE[O, T]} is shown to converge weakly to a Gaussian
process under certain conditions. This result establishes the rate of convergence of the
function estimate and is used to construct confidence bands for the renewal function.
We outline bootstrap methodology for constructing fixed and variable width confidence
bands in Sections 5.4 and 5.5, respectively.
In both settings stopping times are
introduced to guide the user in determining if the sample size
IS
adequate.
The
confidence bands associated with these random sample sizes are shown to achieve their
targeted confidence level in an asymptotic sense.
e
5.2 Minimum Risk Estimation
Let c> 0 be the cost of observing each of the random variables X 1,X 2 , ... , X n to
be used in estimating H(t). As in Section 4.4, let W be a monotone increasing function
of bounded variation on the interval [0, T] for some fixed T> O. For estimation of H(t)
over the interval [0, T] consider the loss function
Io
T
L(n;a,c) -
a
[H n (u)-H(u)]2 dW(u)
+ en.
Here a> 0 is the "cost" associated with a unit of the weighted integrated squared error
while en is the cost of a random sample of n observations. The expected loss, or risk, is
then
Io
T
R*(n;a,c) -
a
E[H n (u)-H(u)]2 dW (u)
+ en.
Ideally one would like to select the sample size n which minimizes R*(n;a,c). However,
for any fixed t the form of the mean square error E[H n (t)-H(t)]2 is complicated by
terms which are asymptotically negligible. Since concern lies in large sample estimation
of the renewal function, R*(n;a,c) is simplified before considering the problem of optimal
stopping.
Letting Bn(t) denote the bias of Hn(t), it follows by definition that
Bn(t) =
L
F(k)(t).
k>m
R*(n;a,c) can be rewritten as
T
R*(n;a,c) = a
Io
2
[Var{Hn(u)}
- 113 -
+ Bn(u)]dW(u) + en.
Frees (1986ii) noted that
e·
Var{Hn(t)}
where as previously defined
{rst(c) =
COy {
F
(r-c)
(t-(X 1 + ..·+Xc)),F
for any r,s, and c, 1 $ c $ rAs.
Since {rst(c) $ F
(s-c)
(rVs)
(t-(X1+· .. +X C ))
.
(t), If B1 or B2 or B3 holds,
then by Lemma 3.3 of Frees(1986ii),
-
O.
Also, when B1 or B2 or B3 holds,
lim
n-oo
n
T 2
Bn(u)dW(u)
Jo
1 2
< [W(T) - W(O)] n-oo
lim [n / B n (T)]2
=
0
by the following lemma.
Lemma (Frees): Assume B1 or B2 or B3 holds. Then
O.
(5.1)
- 114 -
}
Proof: If B1 or B2 holds, the result is identical to Lemma 3.1 of Frees (1986ii). As previously mentioned, B3 implies (4.6), from which (5.1) follows. (See the proof of Lemma
0
3.1 of Frees (1986ii).)
Thus
an- 1 JT{
R*(n;a,c) -
o
E
rS{rsu(l)}dW(U) + 0(n- 1 ) + cn.
r,s=l
As in Section 4.4, let
1*
=
JT{
o
f:
rS{rsu(1)}dW(U).
r,s=l
Then the asymptotic risk may be written as
R(n;a,c) = an- 1I* + cn.
R(n;a,c) is a convex function of n in the sense that
~2R(n;a,c) _
R(n+2; a,c) - 2R(n+1; a,c) + R(n; a,c) > 0 for all n.
Thus there exists a finite nc such that
nc = min{ n': R(n';a,c) ~ R(n; a,c) for all n }.
By definition, nc -
00
as the cost per observation c 1o. As one would hope, the
stopping time Nc which is now introduced also depends on c in such a way that Nc 00
as c 1O. Additionally, it will be shown that Nc enjoys a type of risk efficiency, namely
that
.
I1m
R(nc;a,c)
dO E R(N c ; a,c)
- 115 -
= 1.
Suppose there exists known l' > 0 such that 1* > 1'.
"
Let I n be a strongly consis-
tent estimator of 1*. Define
where no is an initial sample size. By definition N c
holds.
Thus, since nc ....
ficient conditions for E ~~
Lemma
"
U:
Assume In
N
$ K for all n. Then E n~
Proof: For x
since (~)
1/2
~
(~I*
-+
-+
-+
y/
2
as
c1 0,
N
ncc
-+
-+ 00
a.s. as c 10 and the relation
1 a.s. as c 1O.
Lemma 5.1 gives suf-
1 as c 1o.
1* a.s. and there exists a positive constant K such that E "In
1 as cI0.
1 and some constant K 1 ,
= O(nc)' Thus, by the Markov inequality,
since E "In $ K for all n. This implies that independently of c, for TJ
- 116 -
>
1,
e·
since for a nonnegative random variable X with distribution function F,
00
J [1 - F(u)]d(u).
EX -
o
Hence
implying that the family
E Nc
n _ 1 as C!0.
c
{~~,
c> O} of random variables is uniformly integrable and
0
A
Theorem Q.,1: Assume either In
=In
A
and the conditions of Theorem 4.2 hold, or In
I n * and the conditions of Theorem 4.3 hold.
If there exists a known I> 0 such that
1* > I' then
lim ~(~i a,c\ _
c!O E
ci a,c
Proof:
Since
Nc ~ 1 ( ~ )
1/2
and
nc =
l.
o( (~y/2),
for
some
constant
Thus {nc/Nc' c> O} is a uniformly integrable class of random variables and Enc/Nc
-lasc!O.
For the estimator J n,
EJ n
<
J:{
a(t)
+ [b(t)]2 }dW(t) <
00
for all n. (This is when B4 or B5 or B6 holds.) If B5 or B6 holds, then it can be shown
that for some K> 0, E I n * ~ K for all n. Using Lemma 5.2, the theorem follows.
- 117 -
0
5.3 An Invariance Principle For Interval Estimation
Let
Yn(t)
=
1 2
n / [H n (t)-H(t)]
{Yn(t), t E [O,T]} for some fixed T
> O.
and
e·
define
the
stochastic
process
Yn
=
Take D[O,T] to be the space of all functions on
[0, T] that are right-continuous and have left-hand limits. Y n is an element of D[O,T]
for all n
~ 1.
Let
~
denote weak convergence in the J 1-topology on D[O,T]. Theorem
5.2, stated immediately after the following definition and assumptions, provides a set of
conditions under which Y n ~ Y, where Y is a zero-mean Gaussian process on [0, T].
For tl' t 2 E [0, T], define
(5.2)
Let Y
= {Y(t), tE[O,T]}
be a zero-mean Gaussian process defined on [O,T]
with covariance function
00
=
L rs {rst 1t 2 (1).
r,s=1
Assume F is absolutely continuous with density f and let f(i) denote the i-fold
convolution of f.
B7:
There exist positive constants K and p, 0
ii)(t)
B8:
F(O)
= 0 and
<
p
< 1, such that for i = 1,2,...
< Kpi for all t E R.
there exist positive constants K and p, 0
<
P
< 1, such that for
1,2,...
Theorem~: Assume Bl or B2 or B3 holds. If B7 or B8 holds, then Y n ~ Y.
- 118 -
i
The theorem holds, for example, when the following simple and widely applicable
condition is met:
on
F is absolutely continuous with density f, F(O)
the interval [0, T].
Clearly this implies that
Furthermore, it can be shown that B8 also holds.
=0,
and f is bounded
the assumption
B3 is
met.
(See, for example, Wold(1981),
W
Theorem 2.3.3.) Thus Y n ... Y.
By definition,
Let
and
Y~ = {Y~(t), tE[O, T]}.
The lemma due to Frees (Section 5.2) asserts that
sup IYn(t)-Y~(t)1 _
tE[O,T]
as n -
00.
n 1/ 2
L
F(k)(T)
-
0
k>m
Thus Y n and Y~ have the same limiting distribution, a fact which
IS
exploited here.
In Section 5.3.1 calculations verify that the finite dimensional distributions of
converge weakly to those of Y.
Y:
1
Lemma 5.5 of Section 5.3.2 gives sufficient conditions
for "tightness" of the sequence {Y~}. The proof of Theorem 5.2 appears at the end of
Section 5.3.2.
5.3.1 Asymptotic normality of finite-dimensional distributions
For arbitrary positive integer
e,
let
l =
- 119-
(t1,... ,t e)', where t1, ... ,t e are fixed
nonnegative
real
The
numbers.
Cramer-Wold
device
is
used
to
show
that
[Hn(t l ), ... , Hn(t )), suitably standardized, has asymptotically a multivariate normal dise
tribution. Thus, let! = (al'... ,ae)' be any e-vector of real numbers. Define
and
By a straightforward extension of the proofs found in Section 3 of Frees (1986ii), it is
shown that n
1 2
/ [D n U)-D *U))
n
=
E~1=1 a.Y'n(t.)
1
1
is asymptotically normal. The method
is to first project DnU) onto the individual observations Xl' ... , X n , yielding a statistic
"
D nU) which is the sum of i.i.d. random variables. Using an array central limit theorem
n 1/2 [DnU) - D:U))
is
shown
to
be
asymptotically
normal.
The
remainder
n 1/2 [D n U) - D n U)) goes to zero in probability.
For 1 $ c $ rVs, i, j=l, ... ,
e, define
and
Theorem
M:
Assume Bl or B2 or B3 holds. Then
n
l
2
/
[D n U)-D:U))
where O'r = E oo _lrs{rst(l), and
...
r,s...
Q N( O,O'r)
...
thus the finite-dimensional distributions of y'
II
- 120 -
e-
converge weakly to the finite-dimensional distributions of Y.
1\
Proof: The proof of Theorem 5.3 is begun by calculating the projection DnUJ. Let
and
By the definition of F ~k) (t) we have
Adopting the convention that U - X)'
= (t 1 - X, ... , t e- X), it follows that
Define the projection
DnU)
= {t
E(DnU)IX.)}
. 1
J
-
(n-1)D~U)
J=
so that
= n-1
t
f:
k{ B(k-1)U - X.) - B(k)U)}.
j=1 k=1
- 121 -
J
~
~
We next calculate Var{DnU)}, Var{DnU)}, and Cov{DnU), Dn(l)}·
e·
By definition,
COy {
-
B
(r-c)
COY
U - (Xl +... + Xc», B
~
{ .L..J1 a i F
(r-c)
(l- (Xl +... + Xc» }
~
(t i - (Xl +... + Xc», L..J a· F
.
1=
-
(s-c)
J=
(s-c)
1 J
(t. - (Xl +... + Xc»
J
-
~r8t(c).
Thus,
By a direct calculation
Lm
CoY
{(r)
(s)}
Bn U), Bn (l)
r,s=1
=
Finally,
=
COY
{
1 n
m
Lm B(r)
n (l), n- L L
r=1
=
Lm
r,s=1
S COY
8[B
(8-1)
j=1 8=1
{(r)
(8-1)
}
B n U), B
(l-X 1)·
Since
- 122 -
(l-X.) - B
J
(8)}
(l)]
}
(r)
COy { B n
(1), B
(8-1)
(1 -Xl)
}
we can conclude that
Lemma Q..2: Assume B1 or B2 or B3 holds. Then
00
L:
rserst(l)
<
00.
r,s=l
Proof:
For every 1 ~ c ~ rAs, i, j = 1,... ,e,
(5.3)
Thus,
<
L Lr
(r)
00
rsF
r=l s=l
<
(t i )
L: L:s rsF
00
+
(5)
s=1 r=l
(tj )
00
by the Heyde lemma (given in Section 4.2) if B1 holds or by (4.12) if B2 or B3 holds.
Hence by definition
e
00
e ..(
. ""
~ a·a·
I J ""
L...J rsl rSIJ 1) I
1,j=1
r,s=1
- 123 -
<
00 •
o
Lemma~:
Assume B1 or B2 or B3 holds. Then
Proof: Let X nj = n
-1/2
m
{(k-1)
)} .
Ek=lk B
U -Xj ) - B( kU),
J = 1,... ,n. By definition,
n
- .LX
..
1 nJ
J=
Also, E X nj = 0, j = 1,...,n, and
:r~
=
var{ .t 1 X nJ·}
m
=
J=
For
l
>0
L
rS{rst(l) -
r,s=l
let
We verify the Lindebergh condition
o.
The first step is to bound the integrand by an integrable function of u:
- 124 -
which
is
integrable with
lim X (u E Sn)
n-oo
convergence.
=0
M:
to the
measure dF
by
Lemma 5.2.
Since
for any finite u, the Lindebergh condition holds by dominated
The lemma follows from a standard array central limit theorem.
Serfling(1980), Section 1.9.2.)
Lemma
respect
0
Assume B1 or B2 or B3 holds. Then
Proof: By prior calculations,
It is shown using dominated convergence that
(5.4)
and
(5.5)
which will imply the result.
- 125 -
(See
e·
<
00.
For fixed rand s with c ~ 2, n (~r\~)(~:~) Next,
I n(~r\i)(~'=-i)
<
- rs
I
O. Thus (5.4) holds.
$ 2rs. We can now write
00.
For fixed rand s,
I n (~rl(i)(~'=-i)
Proof of Theorem
M: By Chebyshev's inequality and Lemma 5.5,
- rs 1-+0 , so that (5.5) follows.
It follows from Lemmas 5.3 and 5.4 that
nl/2(Dn<l)-D~<l)) Q
- 126 -
N(O,O'f).
0
Now suppose 0 ~ t 1 ,... ,t ~ T. Then the finite-dimensional distributions of Y~ are
e
asym ptotically
multivariate
normal
by
E!
a· Y(t i ) is a N(O,q2 ) random variable.
t
1=1 1
...
the
Cramer-Wold
device.
Since the choices of
e and
By definition
a = (a ,... ,ae)'
l
...
were arbitrary, we see that the finite-dimensional distributions of Y~ converge weakly to
o
the finite-dimensional distributions of Y.
5.3.2 Tightness of the process
The assumptions B7 and B8 will be shown to be sufficient for "tightness" of the
sequence {Y~}.
Lemma 5.5 states the principal result of this subsection.
Used in
conjunction with Theorem 5.3 it establishes sufficient conditions for weak convergence of
Y~ to Y.
Lemma
M:
for all n
~
Assume B7 or B8 holds. Then there exists a positive constant K such that
1.
The proof of Lemma 5.5 appears at the end of this subsection after several intel'mediate lemmas.
Some additional notation will be required.
For i = 1,2.... let
Suppose {XcI' ... , Xci}
subset of i random variables from X 1,,,,,X n and let E ci denote summation over all
such
subsets.
Define
S.=X
Cl
c 1 + ..·+X.,
Cl
\Vith the new notation one may write
- 127 -
S Cl.(1) = x( S Cl. E (tl,tn])
- P'l'
;t.
1
is a
en
and
F~i)(t)
(ir1I;:x(Sci~t),
=
CI
and
Thus
(5.6)
Define
and
U nijkl = #(Snijkl)
let
j
k
be
the
cardinality of S D1J
"kll'
t:.
Upper
bounds
e
for
n 2( i )-1( )-1( )-1( )-1 U nijkl and E[Sci(1)Sck(2)] 2 are found in Lemmas 5.6 and
5.7, respectively. The proof of Lemma 5.5 follows immediately from these lemmas.
Lemma
M:
With U nijkf as defined.
j
k e
n2( i )-1( )-1( )-1( )-1 U nijkl < 7ijkl.
- 128 -
one of the following two possibilities must occur.
(a) One observation is shared by all four sums. An upper bound on the number
of
· can occur IS
. n (n-1)(n-1)(n-1)(n-1)
ways t h IS
i-I j-1 k-1 i - I '
V nijke n ( n-1)(n-1)(n-1)(n-1)
i-I j-1 k-1 i-I
It is easily shown
that
kn
= n -1··
IJ c..
(b) One observation is shared by two sums, while a second observation is shared
by the other two sums.
choose the four sums is
In this case an upper bound on the number of ways to
(~)n(n-1)(i~n(j=D(k=t)(e=D.
One then has that
The lemma follows by adding the bounds obtained in cases (a) and (b).
0
For the following lemma, let i, j, and c be nonnegative integers such that i 2: 1,
j 2: 1, and 0 S c S iAj. Let Xl'.",X i+j _ c be i.i.d. random variables having distribution
F. Define Si
= Xl +".+X i and
Sj
= Xl +.,,+X
C
+X i + 1 + ... +X i + j _ c ' so that Sj and
S. have c random variables in common.
J
Lemma
U:
Assume B7 or B8 holds.
Then for i,j = 1,2,... there exists a constant K
independent of i and j such that
Proof:
First suppose that i S j and c
< j. If B7 or B8 holds, then
- 129 -
_(CI
C
C
F (i-C)(t 2 _U) - F(i-C)(t C U)][F O- )(t 3 -U) - F O- )(t 2 -U)]dF(C)(u)
-00
$ Kp
j-c
.
(t 3 -t 2 )
JOO[F (i-c) (t
2 -u) - F
(i-c)
l
(t1-u)JdF
(c)
(u)
-00
If c
= i = j, then Si = Sj and the inequality again holds.
C$ i $j. Similarly, if c $ j $ i, then
Thus,
which completes the proof of the lemma.
0
- 130 -
Thus it holds whenevel'
Proof of Lemma
M:
By Lemma 5.7 and the Cauchy-Schwarz inequality, there exists a
constant K such that
for all i, j, k, and
e.
Thus, by (5.6) and Lemma 5.6,
<
f:
i,j,k, e=l
7ijkeK(t _t )2(piVk)1/\pjV€)1/2
3 1
for all n.
Proof of Theorem Q.2.:
square
continuous
E [Y(t 1 ) - Y(t 2 )]2 -
on
With F continuous, the process Y can be shown to be mean
the
0 as
interval
t 1 - t 2.
[0,
An
T].
That
is,
for
tl't 2 E [0,
immediate consequence of this
E[Y(T)-Y(T-)]2 = 0 and thus P{Y(T),t;Y(T-)} = O.
That Y~
is
T],
that
Q. Y follows from
Theorem 5.3 and Lemma 5.5, using Theorem 15.6 of Billingsley (1968). Thus Y n
Q. Y
0
by the remarks at the beginning of this section.
5.4 A Sequential Bootstrap Confidence Band
As
in
the
previous
section,
let
Yn(t)
= n 1/ 2[H n (t)-H(t)]
and
Yn
=
{Yn(t), tE[O,T]}. Theorem 5.3 gives sufficient conditions for Y n to converge weakly
to Y, where Y is a zero-mean Gaussian process. Define the random variables
- 131 -
Sn =
Since Y n
Q.
has that Sn
sup IYn(t)l, n = 1,2,... and S = sup IY(t)l.
tE[O,T]
tE[O,T]
Y and the L oo norm is continuous in the Skorohod metric on D[O,T], one
D
-+
S.
Let G(x) = P( S:5 x) for x E R.
If the distribution G were known, then a confidence band for {H( t), t E [0, T]}
could be constructed in the following way.
Let G- 1(y)
= inf{ x: G(x) ~ y}.
For a
target confidence level of 1- Q, take the confidence band
{ Hn(t) ±
(5.7)
G-l(1_Q)
n
}
1 / 2 ' t E [O,T] .
Then
P { Hn(t) -
as n -+
00,
G-l(1
n
1/;
Q)
< H(t) :5 Hn(t) +
G- 1(1- Q)
n
1 / 2 ' for all t E [O,T]
}
assuming G is continuous at G- 1(1-Q). Thus if G were known, (5.7) would
give a reasonable large sample confidence band for {H(t), t E [0, T]}.
In applications,
the distribution G will be completely unknown. The remainder of this section describes
how G- 1(1-Q) and a corresponding confidence band can be estimated using a bootstrap
procedure.
It is assumed that the following conditions on F and G hold.
B9:
F is a distribution on the positive real line (i.e., F(O)
= 0).
Furthermore, F is
absolutely continuous with density f such that f is continuous a.e. with respect
- 132 -
e
to Lebesgue measure. There exists L > 0 such that
sup f(x) < L.
XE[O,T]
B10: G is continuous and strictly increasing in a neighborhood of G- 1(1- a).
The estimator H n is calculated from a random sample X 1 ,oo.,X n of random
variables having distribution F. Let F n =
F~l)
be the ordinary sample distribution func-
tion. Bootstrap samples will be drawn from a smoothed version of F n' The role played
by the density f in the proof of Theorem 5.3 motivates the smoothing of F n.
Let K be a real bounded function vanishing outside the interval (0,1) such that
J K(y)dy =
1. For h n > 0 define the usual kernel density estimator
fn(x)
-
{
0,
x
L:
_I_ n
(-Xi)
K -h- ,
nh n
i=l
n
x
<
~
0,
O.
One more assumption will be required.
00
B11: The sequence of positive constants {h n }n= 1 is such that h n -- 0 and fn(x) -- f( x)
a.s. for every continuity point x of f.
For t E R, define
t
F\(t) -
f
fn(u)du.
o
... (k)
...
...
Finally, let F n
be the k-th convolution of F n and H n be the renewal function
...
...
00
... (k)
associated with F n , so that Hn(t) = E =l F n (t).
k
For some positive integer M let X;l'.oo,X;n' i = 1,oo.,M, be i.i.d. random variables
from the distribution
Fn'
For i
= 1,... ,M define
- 133 -
where
Here
~c
denotes the summation over all subsets of k random variables drawn from
X.* , ...,X.* .
11
Hn1· (t)
(Thus the definition of
10
in terms of X~ , ..., X~ is identical to the
11
10
definition of Hn(t) in terms of Xl'... ,X n .)
*
Letting Y Dl.(t) = n
1/2
m
H(k)
[H n
.(t)-l~k-l
F n (t)], define
H
Y *. = {Y *.(t), t E [0, T]} and S *. =
Dl
Dl
Dl
sup \Y *
.(t)1.
,1= 1,... ,M.
tE[O,T] Dl
For x E R, let
1
= M
M
*
L:
XeS . $
i=1
x),
Dl
the
sample
-1
distribution
G n (y) = inf{x:
function
of
Gn(x)~y}, yE[O,I].
the
*
Sni's.
As
was
done
for
G,
take
Emulating the form of (5.7), define a confidence
band for H(t) on the interval [0, T] by
-1
Gn (1-0)
}
1/2' t E [O,T] .
{ Hn(t) ±
n
(5.8)
-1
Theorem 5.4 partially answers the question of when G n (1- 0)
Theorem 5.4:
M,n ---.
Assume B9, BI0 and Bll hold.
00.
- 134 -
-1
IS
consistent for
1
Then G n (1- 0) ---. G- (1- 0) a.s. as
Proof: In the appendix the results of Section 5.3 are modified to show that Y*.
Q Y as
Dl
n
-+ 00,
norm
i = 1,... ,M.
III
D S as n
Thus S * . -+
i = 1,... ,M, by the continuity of the L oo
-+ 00,
Dl
the Skorohod metric on D[O,T].
Given the smoothed sample distribution
- , the random variables Snl,Sn2,,,.,SnM
*
*
*
function F
are independent and identically
n
distributed. Thus
-1
G n (I-a)
as M,n
-+ 00.
-+
1
G- (I-a) a.s.
o
Theorem 5.4 verifies that the bootstrap confidence band (5.8) provides the correct
coverage probability asymptotically:
< H(t) < Hn(t)
= p{ sup In
tE[O,t]
as M, n
-+ 00.
+
. -1
G n (1- a)
1 2
/ [H n (t)-H(t)]1
1/2'
n
for all t E [O,T]}
~ G~\I-a)}
This justifies the use of the bootstrap procedure when the sample size n
and the number of bootstrap samples M are both large.
Section 4.5 yielded a sequential procedure for a fixed-width confidence interval
when estimating H(t) at a single value of t. For estimation of H on the interval [0, Tj,
the analogous problem is to develop a sequential version of the fixed-width confidence
band
{ Hn(t) ± d, t E [O,T] }
- 135 -
where d >0.
Fix d
e·
°
> and for some initial sample size no define the stopping time
Nd -
Then Nd -
00
.
mf{ n
a.s. as d! 0, M -
00.
~
no: n 1/2 d
1/2
d
w
-
Y as d! 0, M -
p{
00,
}
G -1
n (I-a) .
By the relation
under the conditions of Theorem 5.4, N d d YN
~
G
-1
(I-a) a.s. as d! 0, M -
00.
If
then
sup IH N (t)-H(t)1
tE[O,T]
d
~ d}
Thus the following theorem would hold.
Theorem
M: Assume B9, B 10, and B11 hold.
W
If Y N -
Y as d! 0, then
d
p{tE[O,T]
sup IHN(t)-H(t)l~d} d
I-a
as d ! 0, M -
00.
In Section 6.3 we outline one method for showing weak convergence of a randomly
indexed stochastic process such as Y N based on a two-dimensional process. Given small
d
- 136 -
e
6> 0, consider the random variables defined by
[ns] {H[ns](t) - H(t) }
Zn(s,t)=
1/2
'
sE[0,1+6],tE[0,T].
n
(where [.] denotes the greatest integer function) and the associated process
Zn = {Zn(s,t)j sE[O, 1+6], tE[O,T]}.
Part of Section 6.3 describes how if Zn converges weakly in an appropriate topology on
[0, 1+6] x [O,T], then Y N
d
~ Y as d! 0. That Zn might converge weakly under appro-
priate conditions is suggested by Theorem 4.1 and 5.2
In practice one would only conduct the bootstrap procedure on some finite subset
of points in the interval [O,T]. Thus rather than verify the weak convergence of Zn, we
show that for an arbitrary grid of points t 1 ,... ,t e taken from [O,T] one has that
Let (aI' ... ,ae) be any vector of real numbers.
Wold device, E!=laiYn(ti)
Q E!=laiY(ti).
If one also assumes that [mn]2 10g n = o(n),
then as previously asserted Lemma 4.8 implies
probability sequence for i= 1, ... ,e.
By Theorem 5.2 and the Cramer-
{Yn(ti)}n~l
is a uniformly continuous in
By Lemma 1.4 of Woodroofe (1982) the sequence
{E~1= 1a'Yn(t.)}
>1 is also uniformly continuous in probability.
1
1
n_
a.s. as d! 0, E!=lai Y Nd(t i )
Since N d /d- 2 ~ c 2
Q E!=lai Yet) by Anscombe's Theorem. Again using the
Cramer-Wold device, it follows that (YNd(tl)'''''YNite))
Q (Y(tl), ... ,Y(t e )) as d!O.
(See Section 1.3 of Woodroofe (1982) for a discussion of uniformly continuous in
probability sequences and Anscom be's Theorem.)
- 137 -
It should be noted that while we have taken the bootstrap sample size to be equal
to the sample size n, this is not necessary in practice. If n 1 is the sample size of each of
the M bootstrap samples, then it can be seen that all of the asymptotic results of this
section hold if n 1 .... 00 as n .... oo. Another point is that one probably need not smooth the
empirical distribution before resampling to get useful confidence bands. The convolution
of F n with the kernel K provided a means for showing weak convergence of the
bootstrapped process.
While the smoothing has not been shown unnecessary for the
asymptotic results considered here, in the usual finite sample situation one could make
the bandwidth h n arbitrarily small, in effect resampling from the unsmoothed sample
distribution function.
5.5 A Variable-Width Bootstrap Confidence Band
The supremum S of the limiting gaussian process Y is more likely to occur on
subintervals of [0, T] where the variance function"
subintervals where
0'; is comparatively small.
0'; is relatively large, rather than on
The procedure of Section 5.4 ignores this
fact and consequently yields a confidence band which is unduly conservative.
The
method presented in this section overcomes that defect by taking into account the
variance function
where 0
O'r.
The objective will be a confidence band for {H (t), t E [T 1,T2] },
< T1 < T2 <
such that
00.
It will be assumed
that there exists a positive constant (
0'; > (for all tE[T 1,T 2 ].
,.. W""
..,
Theorem 5.2, Y n .... Y, where Y = {Y(t), tE[Tl'T 2 ]) is a zero-mean Gaussian process
AI
having covariance function
- 138 -
e
Here R is the covariance function of Y, defined by (5.2). Let S =
sup
IY(t)1 and
tE[Tl'T 2 J
G(x) = peS ~x) for xEIR.
Let O'~t be the estimator of
0'; defined by (4.13).
Since F is continuous,
by the remarks at the conclusion of Section 4.4.1. Since
sup
I O'nt - O't
tE[T l'T 2 J
(5.9)
Thus O't can be replaced by
0' nt
In
I .... 0
0'; is bounded away from zero,
a.s.
the process Y n and the resulting process will
converge weakly to the process Y. Thus for a large sample size n the confidence band
(5.10)
would have meaning in the sense that
= p{
sup
tE[T ,T J
1 2
I nl/2[H~(t)-H(t)JI ~
G- 1(1-a)}
nt
..... p{S~G-l(1-a)}
I-a.
Unfortunately, as with the supremum distribution G of Section 5.4, the form of G
IS
completely unknown.
For the remainder of this section, it will be assumed that Bll holds, as well as the
- 139 -
following modified versions of B9 and B10.
B9':
F is a distribution on the positive real line (Le., F(O)
= 0).
e·
Furthermore, F is
absolutely continuous with density f such that f is continuous a.e. with respect
to Lebesgue measure.
There exists L
exists a positive constant
B10':
f
such that
> 0 such that
0'; >
G is continuous and strictly increasing in a
f
sup f(x)
XE[O,T]
for all t E [T l' T 2].
neighborhood of
< L.
There
G-1(1- a).
Take M to be a positive integer and let Y *., i = 1,... ,M be the bootstrapped
Dl
processes defined in Section 5.4. Define
-*ni =
Y
{Y~i(t)
~, t E [T1,T2]
}
and
i= 1,...,M.
i
=1,... ,M.
-*.
D -S,
By Theorem A1 of the appendix and (5.9), Y
Dl W-+ Y- and -*
Sni -+
For x E R, let
M
1 'LJ
" X(Sni
-* ~x).
= M
i=l
Following the form of (5.10), define the confidence band
(5.11)
Theorem 5.6 gives sufficient conditions for the confidence band (5.11) to have the
correct asym ptotic coverage probability.
- 140 -
Theorem 5.6: Assume B9', B10' and Bll holds. Then G~l(1-a) -
M,n -
G- 1(1-a) a.s. as
00.
The proof is identical to that of Theorem 5.4, replacing Y: , S:i' S,
i
1
1
1
G- by Y*. 5*., 5, 8 - , and 8- , respectively.
0
n
nl'
nl
Proof:
G:
1
, and
By Theorem 5.6, as for the confidence band (5.8), we have that
... 1
P { Hn(t) -
as M, n -
00,
G~ (I-a)
1/2
n
... 1
O'nt
< H(t) < Hn(t) +
G~ (I-a)
1/2
n
O'nt' t E [Tl'T 2 J
}
which provides some justification for this bootstrap procedure.
To define a stopping time for the new procedure, the standard deviation
scaled by a
con~tant
d
> OJ
d is then allowed to go to zero.
Fix d
>0
0' nt
is
and for some
initial sample size no define the stopping time
Upon stopping, take the confidence band
(5.12)
As for the stopping time N
d!O, M -
00.
If Y N
W
d
-
d
...
"'1/2'" 1
of Section 5.4, N d -+00 and N d d - G- (I-a) a.s. as
Y as d! 0, M -
...
00,
w ...
it follows that Y... - Y as d!O, M
N
d
- 141 -
-+ 00
and
e·
Thus the following theorem would hold, giving assurance that the confidence band
(5.12) has the intended asymptotic coverage probability.
Theorem QJ.: Assume B9', B10', and Bll hold. If Y N
d
~Y
as d! 0, then
as d 10, M
- 142 -
-+ 00.
CHAPTER 6
STOPPING TIMES FOR FIXED-WIDTH CONFIDENCE BANDS
6.1 Introduction
In Sections 5.4 and 5.5 the problem of renewal function estimation motivated the
introduction of stopping times for fixed and variable width confidence bands.
In this
chapter we consider such stopping times from a more general point of view.
Let Xl"'" X n be independent and identically distributed random variables having
distribution function F. Suppose R F
= {RF (t), tEl} is an unknown functional process
of F, where IS;; R is either a bounded closed interval, a closed half-line, or R itself. There
often exists an estimator R n ( .) = R n (.
on I.
j
Xl' ... , X n ) of R F ( .) and an associated process
Several authors have exploited weak convergence properties of rn to construct
large sample confidence bands for the functional process R F .
The methodology
introduced in this section extends readily to the case where R F is an unknown functional
process of F defined on I S;; RP, P > 1. See Section 6.6 for an exam pIe.
In applications where I is an unbounded interval one can always find a monotone
increasing function that maps I into a bounded closed interval If.
(The function
estimator may then have to be appropriately defined at endpoints of If.) Thus without
loss of generality and for simplicity of presentation take I to be a closed bounded
interval.
Denote by D(I) the space of functions on I that have left-hand limits and are rightcontinuous.
Throughout this chapter we assume that R F and the estimates rn, n 2:: 1,
belong to the space D(I).
Let p denote the usual Skorohod metric on D(I); weak con-
vergence is then taken to be in the Skorohod J 1-topologyon D(I). The notation ~
(~)
will be used to denote weak convergence (equivalence in distribution) of processes while
g will denote weak convergence of random variables.
Remark:
If I were an unbounded closed interval, then one could let p be the metric k
considered by Lindvall(1973) or any other metric which extends the J I-topology to D(I);
weak convergence could then be assumed to take place in this extended topology. The
discussion in this section would then hold if the supremum norm were replaced by the
norm p( ., 0), where 0 is the function that is identically 0 on I.
A typical approach assumes that rn
interval I such that P{ SUPtEI IOF(t)1
For a desired confidence level of I-a,
'!i
0 F'
< oo} =
where
0F
is a Gaussian process on the
1. Define
O<a<l, let c
= inf{x: GF(x) 2:: I-a}.
If Cn is a
weakly consistent estimate of c, then a sensible large sample confidence band for R
given by
(6.1)
{ R n (t) ± Cn n
-1/2
for then
- 144 -
, tEl },
F
is
assuming G F is continuous at c.
In some special cases G F is independent of F and explicitly known, so that one
can take Cn = c in (6.1).
Estimation of a continuous distribution function F by the
empirical distribution function is such a case, since then G F is the distribution of the
supremum of the Brownian bridge.
Because G F is usually completely unknown,
bootstrap methods have been advocated for estimating c. (See, for example, Bickel and
Freedman(1981) and Csorgo and Mason(1989); other work on bootstrapping empirical
processes is summarized in the latter paper.)
Sometimes the covariance function of
~F
may be known, at least up to its dependence on F. Then a possible approach is to use
either the covariance function or an estimate thereof to simulate independent copies of a
Gaussian process hopefully close in distribution to OF;
taking the supremum norm of
each of these copies, G F and c can be estimated empirically.
If the objective is to estimate R F on I with some uniform prespecified precision,
then the idea of a fixed-width confidence band is appealing. In practice one might want
(6.3)
where d>O is a constant that reflects the desired accuracy.
Typically the minimum
sample size n such that (6.3) holds is unknown and some guidelines for deciding when an
adequate sample size has been reached would be useful.
The analogy with fixed-width
confidence interval estimation of a real-valued parameter O(F) is suggestive. Suppose On
is a consistent estimate of O(F) and one would like the smallest sample size n, usually
unknown, such that
g
P{ IOn - O(F) I
N(O, (7'2) as n .... oo and
s~
~
d }
~
I-a. Further assume n
is a strongly consistent estimator of (7'2.
1/2
(On - O(F))
For the fixed-
width confidence interval problem, a number of authors have considered stopping times
of the type
- 145 -
,
. {
-1/2
}
Nd=mf
n~no: n
SnTOl/2~d,
where no is some initial sample size and TOl/2 is the (1 standard normal distribution.
Sen(1981).)
Ol/2 )-th quantile of the
(See, for instance, Chow and Robbins(1965) and
With appropriate conditions on the sequences {On} and {s~} it has been
shown that
lim
d!O
p{ 10N ,
-
O(F)I
d
< d} = I-Ol,
-
so that for sufficiently small d the confidence band (0 , - d, 0
Nd
,+ d)
Nd
has approximately
the desired coverage probability.
This chapter has as its primary purpose the definition of a stopping time Nd such
that under certain conditions
(6.4)
lim
d!O
p{ IR N d (t) -
RF(t) I
< d, tEl} = I-Ol.
...
-
Let Cn be a strongly consistent estimate of c. For fixed d>O and some initial sample size
no, define the stopping time
(6.5)
upon stopping use the confidence band
{ R N (t)
(6.6)
d
± d, tEl }.
Thus the rule states that one should stop when the width of the confidence band
determined by (6.1) is at most 2d.
The most basic properties of Nd follow directly from its definition.
relation
(6.7)
- 146 -
By the
Nd is finite a.s. for fixed d
> O. If the quantile c of the distribution G F were known,
then for small d > 0 a reasonable fixed sample size would be
. {
-1/2
nd = mf
n~no: n
c ~ d
(6.8)
}
,
where no is the initial sample size that appears in (6.5). Suppose P{ Cn
~
0 } = 0 for all
n~no' Since n d _c 2 /d 2 as d!O, one then has that Nd/nd-+1 as d!O by (6.7).
It is shown in Section 6.2 that if one also assumes that rN
(6.4) holds.
d
~F
Y:i
Several methods for establishing weak convergence of rN
d
as d! 0, then
are given. In
Section 6.3 sufficient conditions are given for the expectation of the stopping time to be
finite for fixed d > 0, as well as for the asymptotic efficiency and normality of Nd as d! O.
Several bootstrap estimators of c are considered in Section 6.4, along with some resulting
properties of the stopping time.
In some applications one might want to allow
confidence bands of variable width. Section 6.5 addresses the problem of stopping times
for such modified confidence band procedures. An example in Section 6.6 illustrates how
a stopping time of the form (6.5) can be used in the construction of a confidence region
for a statistical function R F defined on RP , p ~ 2.
6.2 Asymptotic Coverage Of The Confidence Band
The principal result of this section concerning the asymptotic coverage probability
of the confidence band (1.6) rests on the assumption that r Nd
~ ~F as d! O. Theorem
6.2 gives a single sufficient condition for the weak convergence of rN
then indicates how one might verify this condition.
C1 summarizes the principal assumptions of the introduction.
- 147 -
d
j
some discussion
With previously defined notation, rn
Cl:
= 1.
GF(c)
= I-a and
~
OF on I, where P{sUPtEI
I~F(t)1 <
oo}
G F is continuous at c. There exists an estimator Cn of c
such that Cn .... c a.s. as n .... oo and P{ Cn $ 0 } = 0 for all n ~ no.
The estimator Cn can always be modified, without affecting its consistency, in
order to meet the requirement that P{ Cn $ o} = 0 for all n ~ no.
Tsirel'son(1975)
showed that if OF is a separable nondegenerate mean-zero Gaussian process which is
almost surely bounded on I, then G F must be continuous on the half-line (0, 00). Thus
the continuity condition included in Cl is generally met.
W
Theorem 2...l;, Assume Cl and that rN ....
d
Proof:
From the basic properties of N
1/2
cNd ....c a.s. and N d
0F
d
as d 1o. Then (6.4) holds.
discussed in the introduction it follows that
d .... c a.s. as d 10. Thus the result holds by (6.2).
In practice it will of course be necessary to verify that rN
sufficient condition for this is as follows;
d
~ ~F
0
as d 1o.
A
this condition generalizes the notion of a
sequence of random variables which is uniformly continuous in probability. A sequence
of random variables
f
>0
{Xn}n~l
is uniformly continuous in probability (u.c.i.p.) if given
there exists 6 > 0 such that for all n
~
1
The condition we consider here is as follows.
Given
f
> 0 there exists 6> 0 such that for all n
- 148 -
~
1
e·
Anscom be's Theorem gives conditions for a weakly convergent sequence of random
variables to have the same limiting distribution when the index is replaced by a random
index.
Theorem 6.2 is the analogous result in the present setting;
the same type of
argument employed in the proof of Anscom be's theorem is used here.
Theorem 6.2: Assume C1 and C2 hold. Then (6.4) holds.
Proof: Since Nd/nd-+1 a.s. as dlO by (6.7), for arbitrary fixed
f, f'
>0 one may choose
6 > 0 such that
< t',
making use of C2.
P
Thus p( rN ' rnd) -+ 0 as d 10 and r Nd has the same limiting
d
distribution as rnd' namely the distribution of
0 F'
0
A convenient method for verifying condition C2 is to take recourse to a twoparameter process as introduced below.
For each fixed tEl and given a small 6> 0,
define a process on the interval [0, 1+6] by
[ns] {R[ns](t) - RF(t) }
Zn(s,t)=
1/2
'
sE[0,1+6].
n
(Here [.] denotes the greatest integer function.)
- 149 -
We consider weak convergence of the
two-parameter process
(6.9)
Zn
= {Zn(s, t);
e·
sE[O, 1+6], tEl}
as a condition which implies C2.
Endow
D[O,
1+6]xD(I)
with
the
S-topology
considered
by
Bickel
and
Wichura(1971) and suppose that there exists a process Z on D[O, 1+6]xD(I) such that
Zn
~
Z in the S-topology.
(Bickel and Wichura (1971), for example, give sufficient
conditions for verifying weak convergence of Zn.)
For any n such that
I n/nd -
11
< 6,
choose s such that nds = n. Then for this fixed s
(6.10)
and as a result
(6.11)
+ P { maxln/n -11 sUPln
d
tEl
1/2
{Rn(t) - RF(t)}1
Since Zn ~ Z, by Theorem 2 of Bickel and Wichura(1971),
(6.12)
- 150 -
> t/2ls 1/2 -11 } .
Again using the weak convergence of Zn, for some constant K> 0
(6.13)
P { maxln/n _llsupln
d
tEl
1/2
{Rn (t)-RF (t)}/>t/2/s
~ P { sU PsE [I_6, 1+6] :~) I Zn(s, t) I > K Is
as 6 -+
1/2
-11
1/2
-1/
-I}
-+
}
0
o.
Together (6.11), (6.12), and (6.13) imply C2 holds. This method for verifying C2
can also be used when R F is defined on a compact set Ie R P by considering weak
convergence of the process defined by (6.9) with t E RP , again in the S-topology of Bickel
and Wichura(1971).
Another potential approach for establishing C2 supposes that there exists a
sequence of IT-fields {en} such that {Rn (·) -
R F (·); ell}~=1 forms a reverse
martingale process. Taking 6> 0 and N = [n6], one would then have that
is also a reverse martingale process. It follows from this that
and
consequently
{
SUPtEI
I Rn+k(t)
-
- 151 -
Rn+N(t)
I; en + k }~=l
IS
a
reverse
submartingale. One could then try to bound the probability
using the Kolmogorov inequality for reverse submartingales.
6.3 Properties Of The Stopping Time
By making additional assumptions about the quantile estimator Cn, some
questions about the expectation and rate of convergence of the stopping time can be
answered.
In the fixed-width confidence interval problem, the properties that will be
considered here have been shown to hold for the stopping time N~ under a variety of
conditions on the variance estimates {sfi, n ~ no}.
Due to the similarity in form
between N~ and Nd , by putting the same requirements on the sequence {cfi, n ~ no},
the same properties follow for Nd.
Thus no proofs are given here, the necessary
calculations being identical in spirit to those in .the references cited. A problem is that
conditions that are easily verified for variance estimators may not readily apply to
quantile estimators,
80
discussion is restricted to those conditions which seem most
relevant.
Among the basic properties established in the introduction were that Nd
< 00
a.s.
for fixed d>O and Nd/nd-+1 a.s. as d!O. Lemma 6.1 provides a condition which implies
that for fixed d > 0 the expectation of the stopping time is finite, while as d! 0 the
stopping time is asymptotically efficient in the sense that
(6.14)
The type of arguments needed to prove Lemma 6.1 may be found in Sen(1981), pp. 284
and 287.
- 152 -
Lemma.2..l;, Assume C1 holds and that for fixed d > 0
E { sup">"
_
0 c~ } <
(6.15)
Then ENd <
00
00.
and (6.14) holds.
To show that the expectation of the stopping time is finite, another approach is to
establish the equivalent condition that
L
(6.16)
P{Nd>n}<oo.
n~no
Similarly, an alternative condition for verifying (6.14) is
(6.17)
lim
d!O
nd"l
L
P{Nd>n}=O.
n~nd (l+f)
(See, for example, Sen(1981), p.288.)
In Section 6.4 it is demonstrated that if cn
IS
a
particular type of bootstrap estimator it may be possible to verify (6.16) and (6.17).
In the fixed-width confidence interval case one may sometimes deduce a limiting
distribution for N~ as d! 0 when s~ is asymptotically normal. Similarly, in the present
setting asymptotic normality of c~ may provide a rate of convergence for Nd as d! O.
Let { g(n) } be a sequence of positive numbers such that g(n) is increasing in n,
g(n) .... oo as n .... oo, and
(6.18)
Suppose that
(6.19)
D
--+
N(O, ,,),2) as d!O,
where")' is a finite positive constant. Further assume that
- 153 -
(6.20)
{ g(n) [c~/c2 - 1], n~no } is a u.c.i.p. sequence.
(See Section 6.2.)
Together with C1 these three conditions yield the following result,
similar in derivation to Theorem 10.2.2 of Sen(1981).
Lemma 6.2: Assume C1, (6.18), (6.19), and (6.20) hold. Then
D
N(O, ..,,2) as diO.
6.4 Bootstrap Quantile Estimators
One method that has been suggested for constructing confidence bands of the
form (6.1) is bootstrapping the process rn.
conditionally
independent
with
the
Given Xl', .. ,X n , let Xi, ... ,xriI n be
distribution
_
-
n-1~n
LJi=l
-oo<x<oo, where the bootstrap sample size mn .... OO as n .... oo.
indicator function.)
(Here X denotes the
Next, for tEl define RriIn(t) = Rmn(t; Xi, ... , XriIn)j
the
bootstrapped process is then given by
on I. One then typically makes the assumption that rriI n
~ ~F
as n, mn .... OO so that a
natural estimator of the distribution G F is defined by
The bootstrap estimator of c, the (l-o)-th quantile of G F , is then defined to be the
(l-o)-th quantile of G n, F n .
- 154 -
For computational reasons the bootstrap estimate is often itself approximated by
generating conditionally independent copies of rrit n as follows.
X:t'l"'"
I
X:t'm
I
n
For i
= 1, ... ,
Mn
let
be a random sample from the distribution F n , where Mn-+oo as n-+oo.
For tEl define R*mnl.(t) = Rmn(t; X:t'l"'"
X:t'Imn ). Then the processes
I
on I, i = 1, ... , M n , are conditionally independent and identically distributed. Estimate
G
n, F n empirically by
The bootstrap estimate of c is then taken to be cri
= inf{x: G" n , F n (x) ~ I-a}. Putting
cn = cri in (6.5) then defines a stopping time for terminating the bootstrap procedure.
Due to the conditional nature of tlIe ordinary bootstrap estimator, the feasibility
of establishing unconditional properties of the stopping time may depend on the nature
of the function R F and the estimator R n .
Showing that the stopping time has finite
expectation for fixed d > 0 or has the asymptotic efficiency property (6.9) by confirming
that conditions such as (6.10), (6.11), or (6.12) hold may be difficult or impossible. To
gain some insight into this problem we first examine a condition on the random
distribution function G
F
n, n
which is sufficient for (6.11) and (6.12) to hold. While this
condition will not typically be useful in practice, it does suggest how certain properties
of the stopping time depend on the probability distribution of G n, F n and might not
automatically follow when one uses an ordinary bootstrap estimator.
It is then seen
that by partitioning the sample into independent groups of observations one can instead
estimate a distribution analogous to G n = E G n F by bootstrapping;
, n
- 155 -
if the resulting
quantile estimate is used to define the stopping time, the desired properties will hold.
e·
Condition Cl is replaced by the following.
C3:
With previously defined notation, rn
where
P{ SUPtEI
I gF(t) I <
00 }
YI
gF
= 1.
and rriI n
YI
gF
on I as n, mn -+
00,
= I-a and G F is continuous and
Gn,Fn(O) = 0 for all n ~ no } = 1.
GF(c)
1\
strictly increasing in a neighborhood of c. P{
The sequence {M n } is such that Mil 1 log n = o( 1 ).
C3 implies that cri-+c a.s. Let Gn(x)
-oo<x<oo.
= EGn,Fn(x) = P{ SUPtEI I rriIn(t) I ::; x },
Lemmas 6.3 and 6.4 assume that IGn,Fn(x) -
Gn(x)I-+O completely as
n .... 00 in the following sense.
C4: For every x E R and every Ii> 0,
""
L.J P{
n~no
Lemma
- Gn(x)
2.a:. Assume C3 and C4 hold. Then for fixed d > 0, ENd <
Proof; For all n ~ no, P{ Nd
L
(6.21)
I G n , F n (x)
n~no
>
n} ::; P{ n -1/2 C n*
P{ Nd
>
n} ::;
L
>
I>
Ii } <
00.
00.
d}, so that
P{ cri
>
1 2
n / d }.
n~no
D
1/2
Since G n .... G F , for all n sufficiently large one has that [1 - a - G n (n
d)] < -
some
{j
> O. Thus for large n
=
P{
Gn, F n (n 1 / 2 d) < 1- a
- 156 -
}
{j
for
By a standard probability inequality for empirical distribution functions
p{ xER
sup I G F (x) -G F (x) I> 6/2} ~ Ce -2M n (O/2)2
n, n
n ,n
(6.22)
where C is a constant not depending on F n or F.
(See Serfling(1980), Section 2.1.3.)
Hence by (6.21), C3, and C4
<
00,
which is equivalent to ENd
<
00.
0
Lemma 6.4: Assume C3 and C4 hold. Then lim
d!O
Proof:
nei 1 ENd = 1.
To show that (6.17) holds, we derive a bound for P{ N d > n} which is
independent of d for n
~
nd(1+f).
For any 6> 0 and all d sufficiently small, using the
fact that n ~ c 2 (1+f)/d 2 ,
- 157 -
D
,
1/2
By C3, G n ..... G F and one can choose 6, 6 > 0 such that (1- a + 6) - G n «(1+f)
c)
<
-6' for all large n. For this 6 and sufficiently small d, by C4,
<00
and using (6.22)
L
n~nd (l+f)
P{ N d
> n} <
Thus (6.17) holds and the result follows.
L
P{
cri
>
n1 / 2 d} <
00.
n~nd (l+f)
o
For some statistical functions R F and corresponding estimators R n , if F n is in
some sense close to F then Gn,F n may be close to G n .
This suggests the following
condition which is sufficient for C4.
C5:
[Hadamard-Lipschitz condition of order a.]
K( x) such that for all n and x E R
- 158 -
There exists a real-valued function
IGn,Fn(x) - Gn(x) I :$ K(x)IIF n _Fila for some a>O,
where" f II denotes the supremum norm of the function f.
A
The inequality applied in (6.22) to G n F and G n F can now be applied to F n
, n
, n
and F to show that C5 implies C4:
L
n~no
P{ I G n , F n (x) - Gn(x) I > 6 }
L
<
p{IIF n -FII>[6/K(x)]1/a}
n~no
L
:$
Ce -2n[C/K(x)]
2/a
n~no
<
00.
While C5 seems a more natural condition than C4, in many situations neither of these
assumptions will be verifiable.
However, by modifying the procedure so that one
unbiasedly estimates an unconditional distribution function analogous to G n the desired
properties for the stopping time will automatically follow.
In introducing the modified bootstrap procedure some notation is redefined. Now
let {mn} and {M n } be sequences of positive integers such that mn .....
as n .....
00
and Mnmn :$ n for all n.
observations each;
00
and M n .....
00
Divide the sample into M n groups of mn
denote the observations in the i-th group by X l. I , ... , X.
,
lmn
i = 1, ... , M n . One bootstrapped process will be generated from each of the M n groups
of observations; for convenience the bootstrap sample size will be taken to be mn' For
i = 1, ... , M n let X:t'I"'"
1
'
· 'b
d lstrl
utlOn
R
F ni ( x )
=
X:t'lmn now be a random sample drawn from the empirical
mn-I~mn
~j=I
<x
X (X ij _
),
.(t) = R mn ( t; X· I ' ... , X.
) and then redefine
mnl
1
lmn
- 159 -
-00
< x < 00.
For
tEl
define
R*mnl.(t) = R mn ( tj X:t'I"'"
X:t'lmn )
I
e·
Then the modified processes
on I, i = 1, ... , M n , are unconditionally independent and identically distributed.
Now
redefine G n by
G n can now be estimated in an unbiased fashion by
"
M n X( sup Ir * .(t)1 $; x), -oo<x<oo.
Gn(x)
= M n-1 E'1- 1
tEl mn l
C3 must be changed slightly.
C6:
With previously defined notation, rn
where
~ 0F
= 1.
P{ SUPtEI IOF(t)1 < oo}
and
GF(c)
r~nl ~ 0 F
on I as n, mn
-+ 00,
= 1-0' and G F is continuous and
strictly increasing in a neighborhood of c.
"
P{ Gn(O)
= 0 for all n 2:: no }
The sequence {M n } is such that Mi1 1 10g n
= 0(1).
Let
which
inf{ x:
=
" 2:: I-a}.
= inf{ x: Gn(x)
quantiles, for any
i
for
a
natural
estimate
IS
= 1.
Cn
"*
By a standard probability inequality for deviations of sample
>0
P{
1
"*
cn
- cn*
I
>
i
}
$;
2e-2MnO~,
where on = min{ G n (cri+i)-(I-a), (l-a)-G n (cri-i)}.
- 160 -
(See, for example,
Serfling(1980), p.75.)
Using the fact that G n
RGF
and G F is continuous and strictly
increasing in a neighborhood of c, it follows that lim inf 6n
n....oo
then one has that
Ego=no P{
I ~~ -
I ~~ c~
I>
c~
f
}
I .... 0
<
OOj
completely as n ....
consequently ~ri
00
> O. If Mil 1 log n = o( 1 ),
in the sense that for any
.... c completely and
f
a.s. as n ....
> 0,
00.
Now let the stopping time N d be defined with C n = ~~. When C6 holds, one then
has the properties indicated in Lemmas 6.3 and 6.4.
1\
This can be seen by replacing
1\
Gn F
by G n and G n F by the new G n in the proofs of Lemmas 6.3 and 6.4 and then
, n
, n
making the appropriate simplifications.
The modified bootstrap algorithm results in essentially a grouped sequential
procedure when used in conjunction with the stopping time (6.5). Given the sequences
{M n } and {m n }, the effective sample sizes are defined by the sequence {Mnmn}.
Even if the sample were increased a single observation at a time, one usually would not
want to repeat the bootstrap procedure until either the number of subsamples M n , the
subsample size mn, or both had been increased. Thus the stopping time would only take
values in the sequence {Mnmn}. This seems a disadvantageous feature.
Another point bears on the selection of M n and mn subject to the constraint
Mnmn $ n. The number of subsamples M n need only grow at a rate slightly faster than
log n for complete convergence of ~~ to c~, so it seems prudent to allow the subset size
mn to increase relatively quickly to speed convergence of c~ to c. One might let M n be
of order n A for some A, 0 < A < 1, where typically A would not be greater than 1/2 and
could be decreased as n becomes larger.
- 161 -
6.5 Variable-Width Confidence Bands
In some applications the need for a confidence band of variable width arises quite
naturally. If the variance of rn(t) is large for some values of t and comparatively small
for other values, then a confidence band of constant width such as (6.1) or (6.6) will
often be unnecessarily wide for some values of t.
This suggests scaling the confidence
band width at a point t by an estimate of the standard deviation of the process at that
point.
(It should be noted that this may actually increase the width of the confidence
band at some points.)
In other applications a
modified process of the form
{rn(t)/q(t), tEl} is known to converge weakly, where q is a known nonnegative weight
function. These two situations are treated jointly here, since both lead to essentially the
same modifications of the stopping time (6.5) and the confidence band (6.6).
Let {qn} be a sequence of possibly random nonnegative functions on I and q be a
possibly unknown nonnegative function on I.
Next suppose J ~I is a finite union of
intervals and define the processes rn = { rn(t)/qn(t), tEJ}, n = 1, 2, .... Assume that
rn
!Y
ijF'
where
ijF
is
a
zero
mean
Gaussian
process
W
In the case where rn .... OF on I and q(t) = Var{ 0F(t) },
that q is bounded away from zero and infinity on J;
consistent estimator of q on J, then ijF(t)
J~I
on
J
such
that
is necessarily such
if additionally qn is a uniformly
~ { 0F(t)/u(t), tEJ}. Alternatively, one
might have that qn= q for all n, where q is a known weight function that provides an
appropriate normalization of rn.
Set
- 162 -
e
and suppose that C is such that GF(c)
=
I-a and G F is continuous at C.
Given a
weakly consistent estimator cn of C, a reasonable large sample confidence band for R F
on J is then given by
(6.23)
With the preceding assumptions,
(6.24)
= P{ sup Irn(t) I $ cn }
tEJ
-+ I-a.
Although the goal is no longer a fixed-width confidence band, by scaling the function qn
by a constant d>O a stopping time can be introduced; the constant d is then let go to
zero in order to study the asymptotic coverage probability.
Assume cn -+c a.s. and
define the stopping time
...
N
d
. {
= mf
n~no:
n -1/2...
Cn $ d } ,
where no is again an initial sample size. Let
{ R... (t) ± dq... (t), tEJ }
(6.25)
Nd
Nd
be the associated confidence band.
It follows from the strong consistency of c n that if P{ c n
then C...
Nd
-+ c a.s. as d!O.
:5 0 } = 0 for all n ~ no,
By a relation like (6.7), one then has that
- 163 -
[N d ]l/2 d
-+ C as
d 10. Thus with the additional condition that
(6.26)
lim
d 10
p{ I R ...N
=
by (6.24).
r...N
d
~
gF as d 10,
e·
(t) - RF(t) I $ dq... (t), tEJ }
Nd
d
1 -
Q
Thus one can define a meaningful stopping time for the confidence band
(6.23).
Letting Dd = inf{ n~no: n -1/2c $ d }, as for the stopping time Nd one has that
Nd
is finite a.s. for fixed d>O and
Nd /
Dd .... 1 a.s. as d
10. Other properties of the
stopping time such as asymptotic efficiency and normality follow under conditions on
Nd
and cn identical to those considered for Nd and cn in Lemmas 3.1 and 3.2.
6.6 Example: Empirical Processes
Let
~,
~l"'"
~n
be i.i.d. random vectors having continuous distribution
function F defined on RP for some p ~ 1. Define the empirical distribution function by
where c(l!) is equal to 1 if all p coordinates of l! are nonnegative and is otherwise equal
to O. Letting Vn(~) = nl/2[Fn(~) - F(~)], the usual empirical process is then defined
by V n = {Vn(~), ~ERP}. Let EP=[O, l]P be the p-dimensional unit cube. With ~'=
(xl' ... , xp) and F l , ... , F p the marginal distributions of F, define the reduced
empirical process by
- 164 -
W
Since F is continuous, Un ~ U, where U is a tied-down zero-mean Gaussian process on
E P . (See Bickel and Wichura(1971).)
Thus P{ sup
xEE
pi U(x) I < oo}
-
= 1.
For the
remainder of this section, let
GF(t) = P{ sup
E I U(?,S) I
xE P
:5 t },
-00
< t < 00.
If c is such that GF(c) = I-a, then G F is continuous and strictly increasing
In
a
neighborhood of c.
As is well known, if p= 1 then Un ~ B as n
on [0, 1].
~ 00,
where B is the Brownian bridge
Thus when F is continuous G F is the distribution of the supremum of the
Brownian bridge. Bickel and Freedman (1981) have shown that bootstrapping Un yields
a strongly consistent estimate of c, the (l-a)th quantile of G F .
Letting Cn be the
bootstrap estimate and rn(') = U n (·) on [0, 1], define the stopping time Nd by (6.5)
and the associated confidence band by (6.6).
Now let p
coordinates of
~
~
1. When p
~
2 the corresponding G F is usually unknown unless the p
are independent. Again suppose c is the (l-a)th quantile of G F and
use the strongly consistent bootstrap estimator discussed by Beran(1984) to define the
stopping time (1.5). Upon stopping, use the confidence region
(6.27)
for the distribution F. It then follows that if U N
- 165 -
d
~
U as d 10, then
(6.28)
•
-+1-0
as d! 0 by the assumptions and the inequality (6.7).
Bickel and Wichura (1971) note that if {N n } is a sequence of positive integervalued random variables, Cn-+OO, and Nn/cn converges in probability to a positive
"!! U.
random variable, then U Nn
"!! U since
Thus U Nd
Nd/nd -+ 1 a.s. as d!O. Alterna-
tively, weak convergence could have been shown by using the process Zn considered in
Section 6.2 and the method considered there; weak convergence of Zn in this instance
has already been confirmed.
(See the remarks following Theorem 6 of Bickel and
Wichura(1971).)
In this chapter we have defined stopping times for several types of confidence
band procedures and considered various properties of these stopping times. That these
sequential procedures achieve the correct coverage probability asymptotically was seen
in Sections 6.2 and 6.5. The multivariate empirical process example suggests how these
sequential methods can be readily extended to confidence regions in p-dimensional
Euclidean space, allowing a more general treatment than considered here.
- 166 -
APPENDIX
This appendix concerns the weak convergence of the bootstrapped process Y~I
defined in Section 5.4.
Theorem API:
Assume B9 and Bll hold.
Then Y~I
W
-+
Y as n
-+ 00 ,
where Y =
{Y(t), t E [O,T]} is the zero-mean Gaussian process defined by (5.2).
The proof of Theorem API may be found at the end of this appendix.
As in
Section 5.3, weak convergence will follow from asymptotic normality of the finite
dimensional distributions of Y~l (Theorem AP2) and a condition sufficient for tightness
(Lemma AP3).
Assume the original random variables Xl' X 2 , ... are defined on a probability
space (0, GJ, P) and fix wED.
By the definition of fn, B9, and Bll, there exists a
constant L(w) such that fn(t)::5L(w) for all t::5T, for all
n~1.
This implies that for
fixed wED there exist positive constants K and p, 0 < P < 1, such that for i=I, 2, ... ,
and n=I, 2, ... ,
(API)
Here
f~i)
is the i-th convolution of f n. See, for example, the proof of Theorem 2.1.3 of
Wold(1981) for verification of AI. Thus for fixed wED, i=I, 2, ... , and n=I, 2, ... ,
(AP2)
To establish Theorem AI, it is shown that for fixed wEn,
Y~1 ~
Yj
lD
order to
accomplish this, frequent use is made of API and AP2.
Asymptotic Normality of Finite-Dimensional Distributions
For arbitrary positive integer
nonnegative
real
numbers.
[H n1 (t 1 ), ... , Hn1 (t e)],
The
e,
let
1
= (t 1 ,... ,t e)', where t 1 ,... ,t e are fixed
Cramer-Wold
device
is
used
to
show
suitably standardized, has asymptotically a multivariate normal
distribution. Thus, let! = (al ,... ,ae)' be any e-vector of real numbers. Redefine
and
For 1 ~ c
~
that
rVs, i,j=l, ... ,
e, let
and
For all n,
(AP3)
~
KT p
- 168 -
(rAs)
•
e-
Thus for some constant K
00
E
r,s=l
rs I e nrst (1) I ~ K
-
for all n ~ 1. Since e nrst (1) ~ erst(l) a.s., by dominated convergence
-
-
2
E
00
rs enrst(l) ~ O't a.s.
r,s=l
Theorem AP2: Assume B9 and BU hold. Then
where
O'_r =
E oo
_lrserst(l),
and thus the finite-dimensional distributions of Y*n1
r,s_
converge weakly to the finite-dimensional distributions of Y.
The proof of Theorem AP2 is begun by calculating the projection DnCt.)-DriGJ
onto the individual observations Xi!, ... ,Xin' Let
and
By the definition of
F~k)(t) we
*)
E ( -(k)
F n1 (t) IXu
have
=
Again using the convention that
-(k-1)
*
-(k)
(k/n)F n
(t-X U )+(1-k/n)F n (t).
U-X)'
=
(t l -X, ... , t e-X), it follows that
- 169 -
e·
Define the projection
so that
1\
1\
We next calculate Var_ {DnOJ}, Var_ {DnOJ}, and Cov_ {DnOJ, DnOJ}.
Fn
Fn
Fn
By definition,
Cov
Fn {
(s-c) U-(X * +·,,+X * ))}
B n(r-c) U-(X *ll + ..·+X *
1c )),B n
ll
1c
Thus,
- 170 -
By a direct calculation
m
L
r,s=1
(r)
Cov
Fn
(s)
{B n1 (t), B n1 (t)}
Finally,
Lm
s Cov
r,s=1
{(r)
Fn
(s-l)
B n1 (t), B n
(t - Xi!)}.
Since
(r)
Cov_ { B n1
Fn
(s-l)
(t), B n
(t -Xi!)
}
we can conclude that
Lemma API: Assume B9 and Bll hold. Then
- 171 -
* = n -1/2 ~k=l
m k {(k-1)
(k)}
Proof: Let X nj
Bn
U - X*
1j ) - B n U) , j = 1,... ,n. By definition,
-
Also, E... X*.
Fn
nJ
= 0, j = 1,... ,n, and
:r~
For
n
L:X* ..
. 1 nJ
J=
= Var...
{t X~.}
F n j=l
J
-
m
e
L: rs nrst (1) -+
r,s=l
f> 0 let
We verify the Lindebergh condition
lim
n-+oo
- o.
If
:r* . . . 0';
...
= 0, then n
1 2
/
[I)nU) - O:U)] converges in mean square and hence in
distribution to the degenerate random variable which is identically O. Thus assume f~ .....
0'...; > O.
Suppose
te =
max{t 1, ... ,
tel.
Then
say, for all n ? 1 by AP2. Hence there exists no such that for n ?no
- 172 -
is identically the empty set. Thus the Lindebergh condition holds. The lemma follows
from a standard array central limit theorem. (See Serfling(1980), Section 1.9.2.)
0
Lemma AP2: Assume B9 and Bll hold. Then
Proof: By prior calculations,
A
-
nVar{DnUn
A
+ nVar{DnUn-2nCov{DnU),DnU,)}
It is shown using dominated convergence that
(AP4)
and
(AP5)
which will imply the result.
First, n(¥f1
t (~)(¥-=-~) <
rs. Therefore, by AP3, there exists a constant K
c=2
independent of n such that
- 173 -
:S K.
For fixed rand s with
Next,
c~2, n(~r\~)(~.:g)
I n (~r\i)(~-=-i)
- rs
-
O. Thus AP4 holds.
I <
2rs. Again by AP3, there exists a constant K
- rs
I -+ 0
independent of n such that
:S K.
For fixed rand s,
I n (~rl(i)(~'=-i)
, so that AP5 follows.
o
Proof of Theorem AP2: By Chebyshev's inequality and Lemma AP2,
It follows from Lemmas API and AP2 that
n1/2(DnU)-D~U») Q
N(O,I1;) .
...
Thus the finite-dimensional distributions of Y~l are asymptotically multivariate normal
by the Cramer-Wold device.
Since the choices of
e
By definition E~ la. Y(t.) is a N(O,11 2 ) random variable.
and !!;
1=
= (al'... ,ae)'
1
1
.t
were arbitrary, we see that the finite-
- 174 -
dimensional distributions of Y: 1 converge weakly to the finite-dimensional distributions
ofY.
0
Tightness
Lemma AE,;l: Assume B9 and Bll hold. Then there exists a positive constant K such
for all n
~
1.
Using API, the proof of Lemma 5.8 is readily adapted to show that for i,j=l, 2, ... there
exists a constant K such that
for all n ~ 1.
Now let S:i' S:j' S:k' and
S:e
represent arbitrary sums of i, j, k, and
random variables, respectively, from the bootstrap sample Xir, ... , Xin.
S*"kl1 =
DlJ c.
{CS*.,S*.,S*k,S*I1): EN rS*·Cl)Sn*·Cl)S*kC2)S*nC2)l
DI
nJ n
nC.
F n L DI
J
n
nC. J
i-
e
Define
o}
and U*"k
nij c.l1 = #CS*"k
nij c.l1 )' the cardinality of S*"k
nij c.l1 . Then, as in Lemma 5.7, it follows
- 175 -
that
2(n)-I(n)-I(n)-I(n)-1
*
n
.1
k
e U.
J.
nlJ'ke
<
7 ij k
e-
e.
Thus
for all i, j, k, and
e.
Now using the cardinality arguments of Section 5.3.2 and mimicing
the proof of Lemma 5.6,
for all
n~ 1.
0
Proof Qf Theorem API: As noted in the proof of Theorem 5.2, P {Y(T):;= Y(T-)}
when F is continuous.
That
=0
Y~1 Q Y follows from Theorem AP2 and Lemma AP3,
again using Theorem 15.6 of Billingsley (1968).
- 176 -
0
BIBLIOGRAPHY
Anscombe,
F.
(1952).
Large sample theory of sequential estimation.
Proc.
Cambridge Philos. Soc. 48 600-607.
Barlow, R. E., Marshall, A. W., and Proschan, F. (1963). Properties of probability
distributions with monotone hazard rate. Ann. Math. Statis. 34, 375-389.
Barlow, R. E. and Proschan, F. (1964). Comparison of replacement policies and
renewal theory implications. Ann. Math. Statist. 35, 577-589.
Barlow, R. E. and Proschan, F. (1965) Mathematical Theory of Reliability. Wiley,
New York.
Barlow, R. E. and Proschan, F. (1975) Statistical Theory of Reliability and Life
Testing. Holt Reinhart and Winston, Inc., New York.
Beran, R. (1984). Bootstrap methods in statisti~s.· Jber. Deutsch. Math.-Verein. 86
14-30.
Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap.
Ann. Statist. 9 1196-1217.
Bickel, P. J. and Wichura, M. J. (1971). Convergence criteria for multi-parameter
stochastic processes and some applications. Ann. Math. Statist. 42, 1656-1670.
Billingsley, P. (1968). Convergence of Probability Measures. New York, Wiley.
Blum, J. R. (1954i).
Approximation methods which converge with probability one.
Ann. Math. Statis. 25, 382-386.
Blum, J. R. (1954ii). Multidimensional stochastic approximation methods.
Math. Statis. 25, 737-744.
Chow, Y. S. (1965).
Ann.
Local convergence of martingales and the law of large numbers.
Ann. Math. Statis. 36,552-558.
Chow, Y. S. and Robbins, H. (1965). On the asymptotic theory of fixed-width
sequential confidence intervals for the mean. Ann. Math. Statist. 36 457-462.
Chung, K. L. (1954).
25, 463-483.
On a stochastic approximation method.
Csorgo, S. and Mason, D. M. (1989).
Statist. 17, in press.
Ann. Math. Statis.
Bootstrapping empirical functions.
Ann.
Derman, C. (1956). An application of Chung's lemma. Ann. Math. Statis. 27, 532536.
Fabian, V. (1968). On asymptotic normality in stochastic approximation.
Math. Statis. 39, 1327-1332.
Ann.
Frees, E. W. (1983). On construction of sequential age replacement policies VIa
stochastic approximation. Institute of Statistics Mimeo Series #1525, the
consolidated University of North Carolina.
Frees, E. W. (1985). Weak convergence of stochastic approximation processes with
random indices. Sequential Analysis 4, 59-82.
Frees, E. W. (1986i). Warranty analysis and renewal function estimation.
Logist. Quart. 33, 361-372.
Frees, E. W. (1986ii).
1366-1378.
Nonparametric renewal function estimation.
Naval Res.
Ann. Statis.
14,
Heyde, C. (1964). Two probability theorems and their application to some first
passage problems. J. Austral. Math. Soc. Ser. B 4, 214-222.
Kersting, G. (1977). Almost sure approximation of the Robbins-Monro process by
sums of independent random variables. Ann. Prob. 5, 954-965.
Kiefer, J. and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a
regression function. Ann. Math. Statis. 23, 462-466.
Lindvall, T. (1973). Weak convergence of probability measures and random functions
in the function space D[O, 00]. J. Applied Prob. 10, 109-121.
McLeish, D. (1974). Dependent central limit theorems and invariance principles. Ann.
Prob. 2, 620-628.
McLeish, D. L. (1976). Functional and random central limit theorems for the RobbinsMonro process. J. Appl. Prob. 13, 148-154.
Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math.
Statis. 22,400-407.
Robbins, H. and Siegmund, D. (1971). A convergence theorem for nonnegative almost
supermartingales and some applications.
Optimizing Methods in Statistics
(J. S. Rustagi, Ed.), 233-257. Academic Press, New York.
Ruppert, D. (1982). Almost sure approximations to the Robbins-Monro and KieferWolfowitz processes with dependent noise. Ann. Prob. 10, 178-187.
Sacks, J. (1958). Asymptotic distribution of stochastic approximation procedures.
Ann. Math. Statis. 29, 373-405.
Sen, P. K. (1981). Sequential Nonparametrics:
Inference, John Wiley and Sons, New York.
Invariance Principles and Sta tistical
Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics, John Wiley
and Sons, New York.
- 178 -
e·
Sielken, R. L. (1973). Stopping times for stochastic approximation procedures.
Wahrscheinlichkeitstheorie Verw. Geb. 26, 67-75.
Z.
Singh, R. S. (1977) Improvement on some known nonparametric uniformly consistent
estimators of derivatives of a density. Ann. Statist. 5, 394-399.
Smith, W. L. (1954). Asymptotic renewal theorems.
48.
Proc. Roy. Soc. Edinb. A 64, 9-
Strassen, V. (1967). Almost sure behaviour of sums of independent random variables
and martingales. Proc. Fifth Berkeley Symp. Math. Stat. Prob. (Ed. L. LeCam,
et. al.), Los Angeles, University of California Press, Vol. 2, pp. 315-343.
Tsirel'son, J. L. (1975). The density of the distribution of the maximum of a Gaussian
process. Theory Probab. Appl. 20 847-856.
Wold, D. E. (1981).
On smooth estimation of the renewal density.
dissertation, University of North Carolina-Chapel Hill.
Woodroofe, M. (1982). Nonlinear Renewal Theory in Sequential Analysis, SIAM,
Philadelphia.
- 179 -
Ph. D.
© Copyright 2026 Paperzz