Krishnamoorthy, Charu and Carlstein, Ed; (1992).Practical Considerations in Boundary Estimation: Robustness, Efficient Computation, and Bootstrapping."

PRACfICAL CONSIDERATIONS IN BOUNDARY ESTIMATION:
ROBUSTNESS, EFFICIENT COMPUTATION, AND BOOTSTRAPPING
Charu Krishnamoorthy
Research Computing, 2-3224
Glaxo Research Institute
Research Triangle Park, North Carolina 27709
Ed Carlstein*
Department of Statistics
University of North Carolina at Chapel Hill
CB# 3260, Phillips Hall
Chapel Hill, North Carolina 27599-3260
Abstract
A data-set consists of independent observations taken at the nodes of a grid. An unknown
boundary partitions the grid into two regions: All the observations coming from a particular region
share a common distribution, but the distributions are different for the two different regions. These
two distributions are entirely unknown, and the grid is of arbitrary dimension with rectangular mesh.
In this scenario, the boundary can be consistently estimated.
The boundary estimate is selected from an appropriate collection 'r' of candidate boundaries
which must be specified by the user. The candidate boundaries must satisfy certain regularity
assumptions, including a "richness" condition. In practice, one may be faced with a 'r' that is not
sufficiently "rich". How robust is the estimator in such a situation? This question is addressed via a.s.
results comparing the asymptotic error of the estimator with the smallest possible error in 'r'.
Because the boundary estimator requires a search (through 'r') and calculation over the whole
data grid, efficient computation is a practical necessity. An algorithm is presented which reduces the
computational burden by an order of magnitude relative to the naive approach.
A graphical bootstrap procedure is proposed for studying the variability of the boundary
estimator. Simulations of this procedure indicate that it does accurately reflect the true sampling
behavior of the boundary estimator.
*Research supported by National Science Foundation Grants DMS-8902973 & DMS-9205881.
L INTRODUCTION
1.1 The Boundary Estimation Problem. We observe a collection of independent r.v.s {X{},
indexed by nodes i of a finite d-dimensional grid I within the d-dimensional unit cube CUd:=[O, l]d.
The unknown boundary
e is a (d-l)-dimensional surface that partitions CUd into two regions, e and ~.
All observations X{ made at nodes i E 6 are from distribution F, while all observations X{ made at
nodes i E ~ are from distribution G.
distributional assumption is that F
The distributions F and G are entirely unknown; the only
:f. G.
The objective is to estimate the unknown boundary
e,
using
the observed data {X{: i E 1}.
The grid is generated by divisions along each coordinate
axIS 10
CUd'
Along the jth
axIS
(l:$j:$ d), there are nj divisions which are equally spaced at l/nj' 2/nj' ..., n;lnj' Observations are
made at the resulting grid nodes i:=( i 1 /n 1 , i 2 /n 2 ,
•••,
id/nd) E CUd' where i j E {I, 2, ..., n j}'
collection of all nodes i is denoted by I, and the total number of observations is
A ~ CUd' the number of observations (Le., grid nodes) is
d
I I I :=.II
n ·.
J=1 J
The
In any set
I A 11:= #{i E A}.
The notion of a boundary in CUd is formulated in a set-theoretic way: the unknown boundary
e is identified with the corresponding partition (6, ~) of CUd'
The sample-based estimate of
selected from a finite collection <J I of candidate boundaries, with generic element T.
candidate T is identified with its corresponding partition (T, 1:) of CUd'
candidates considered is
cumulative distribution function (e.c.d.f.).
and
I
Again, each
The total number of
I <J II :=#{T E <J I}'
1.2 The Boundary Estimator. Our tool for selecting an estimate
dE:= I h).<X{)-h).<X{)
e will be
for each i E I.
eI from cr I is the empirical
For a candidate boundary T E cr I , compute the e.c.d.f.s
M..(x):= 2: ifT I{X{:$ x}/ 11: II , and consider the differences
Now combine these differences dE using a "mean-dominant
norm" SII~dh' dh' ..., dI/lI) [see Carlstein (1988)], Le., a function SII~ .):
IR~I-+IR~ satisfying the
following definition:
(D.a) [Symmetry] SII~') is symmetric in its
(D.b) [Homogeneity] SII~adl' ad 2 ,
I II
arguments;
..., ad/ll)=aSII~dl'
1
d 2,
...,
d/ll) whenever a ~ 0;
(D.c) [Triangle Inequal.] S/II(dl+d~, d2+d~, ..., d/Il+d/II ) ~ SIIfd 1 , d 2, ..., d/Il)+SIIjd~, d~, 'H' d/II);
(D.d) [Identity] SlIP, 1, ...,1)=1;
(D.e) [Monotonicity] SIIjd 1 , d 2, . H' d/Il) ~ SIIjd~, d~,
(D.f) [Mean Dominance] S IIj d 1 , d 2,
H"
d /II) 2:
H"
dIll) whenever d i ~ d~ 'Vi;
E 1 $. i $. III diJ I I I.
Finally, we standardize SIIj . ) by a multiplicative factor to account for the inherent instability in the
e.c.d.f. The boundary estimator
criterion function
Formally,
e is defined as the candidate boundary in GJ
I
DJ<T):=(ITI I IIII)(IIII IIII),S/Ijdh, d};,
81 := argmaxT~GJI
H"
I
which maximizes the
dI/II)
over all TEGJ I ·
DJ<T).
The intuitive motivation for this estimator is given by Carlstein and Krishnamoorthy (1992)
[referred to as C&K from here on].
A literature review comparing
81 to
other related methods is
provided in C&K, as is an extensive discussion of examples in the d=l and d=2 cases (including the
change-point problem, the epidemic-change model, linear bisection of the plane, templates, and
Lipschitz boundaries).
.La Performance of ~ Boundary Estimator. In order to assess the performance of eI
must quantify the notion of "distance" between two boundaries (say, T and 8).
measure is the pseudometric
8(T,8):=min{A(T 0
e),
A(I 0
en,
where
denotes set-theoretic
0
Consider the following set-theoretic regularity conditions on the boundaries:
REGULARITY CONDITION (R.l): [Non-trivial Partitions]
I T I Ii I I I < 1.
REGULARITY CONDITION (R.2): [Richness of GJ
REGULARITY CONDITION (R.3): [Cardinality of GJ
For each r
> 0,
I GJII
'exp{-r' I II} - 0 as
n
111-00.
2
Also, 0 < A(e) < 1.
n
For each I, 3 TIE GJ I such that the sequence {T I} satisfies:
8(8, T I) - 0 as
we
Our "distance"
symmetric difference, i.e., (A 0 B):=(A nBC) U (ACn B), and A( . ) is Lebesgue measure over
For each T E GJ I ' 0 < A(T) < 1 and 0 <
,
I I I -00.
CUd'
REGULARITY CONDITION (R.4): [Smoothness of Perimeter]
Denote
a' /:= {e, T:
collection of
I II
T E GJ/} and c:PJ<A):= {C E e/: C n Ai- ¢J and C n AC i- ¢J}, where
"rectangular" cells C induced by the grid partition of CUd' and where A ~ CUd. We
sup At:a'/ ..\(c:PJ<A)) - 0 as
require
e/ is the
II I -00.
In C&K, these regularity conditions are seen to be intuitively natural and are explicitly
Moreover, R.l- R.4 are shown to imply strong consistency of the
checked for several examples.
boundary estimator
8/, i.e.,
8(6,
8/)
~. 0 as
111-00.
.Lf S2!!m Practical Considerations iB. Boundary Estimation.
Condition R.2 requires GJ/ to contain some candidate boundary T / that "gets
Rpbustness:
close" to the true
e.
If no such "ideal" candidate were available, we could not possibly hope to
statistically select an estimator
8/ from GJ/
in such a way that consistency holds. Yet, in practice, the
collection GJ/ may be misspecified by the user and may be incompatible with the true unknown
boundary 6. Even though R.2 is violated, one still hopes that
about 6.
8/ will
contain some useful information
In Section 2, this robustness issue is quantified by comparing the error of
8/
with the
minimal possible error in GJ/ ; theoretical results are given in terms of a.s. asymptotic behavior and in
terms of probability of error.
Efficient Computation: Condition R.3 restricts the number of candidate boundaries relative to
the sample size, but still allows for extremely large
I GJ II;
then caution should be exercised in attempting to reduce
(d
~
2),
I II
if the user is concerned about violating R.2,
I GJ/ I. And, for higher dimensional data-sets
tends to be large. Therefore, since the estimator requires calculations over all
i
E I and
over all T E GJ/ ' it behooves us to develop efficient computational schemes. In Section 3, an algorithm
is presented which improves dramatically on the naive approach.
Bootstrapping: Although
8/ is a
consistent estimator of 6 (in the sense described above), it is
difficult to characterize the sampling variability of the boundary estimator - especially in cases where
the boundary is not readily expressible as a parametric function. Therefore, in Section 4, we suggest a
graphical bootstrap procedure for visually assessing the sampling variability of
simulated, yielding encouraging results.
3
e/; this procedure was
2. ROBUSTNESS
2.1
Motivation.
Knowledge about the form of the boundary 8 (e.g., rectangular template,
Lipschitz function, etc.) has been assumed in the formulation of
8 1,
This information is used to
construct "rich enough" collections "1 1 (see R.2), so that boundaries which are "close" to the true
boundary 8 are available as possible estimates. Thus, if the user has knowledge that 8 defines a circle,
then "1 1 is constructed to include only circles. If further information is available on the location of 8
(say the center of the circle is known to be in the left half of CU2), then a smaller set "11 of circles can
be constructed which will still satisfy R.2. Smaller 1"11 I means easier computability (see Section 3)
and sharper bounds on the error probability (see Theorem 2 of C&K). Therefore there are practical
incentives for small "1 1' It is, however, possible for "1 1 to be too small if it violates R.2. This may
happen due to overzealousness in reducing the computational burden, or due to incorrect prior
information. Thus, a question of practical interest is: What is the price paid if the collections Gf 1 do
not satisfy R.2 ?
We shall explore the case of imperfect knowledge about the form of
e.
For example, when
d = 1, if we do not know how many change-points (from F to G and vice-versa) there are, we can
incorporate our uncertainty into the model by including in "1 1 : cases of 1 change-point, 2 changepoints (epidemic-change), ... , M change-points. By Proposition 3 of C&K, we will still satisfy R.3, the
condition limiting the cardinality of "1 1, However, we do need to know M, an upper bound on the true
number of change-points, in order to satisfy R.2.
where the true
Another illustration to keep in mind is the case
e is a circle, but "11 contains only squares.
Formally, let
e[:= arg min o( e, T)
T E "1 1
be the element in "1 1 closest to
minimal error that we can expect to make in estimating
the same order as
o(8J, e)
ej
e.
Then
o(eJ, e)
if the error of our estimator, 0(8 1 ,e), is of
we shall be reasonably satisfied, i.e.,
6 1 is
"robust".
We will study the
asymptotic behavior of 0(6 1,e) and compare it to the asymptotic behavior of 0(8[,e).
8[ is not necessarily unique: There might well be a class of candidates
e[
possible ambiguity the study of 0(6[1e[) is not appropriate here.
Notice that
that are closest in terms of
the pseudometric 0, but each of them might yield a different value of 8(6 [l8[).
4
is the
Because of this
2.2 Asymptotic Target Error. Assume that R.1 holds. Given 8 and the candidate family
{~1
: VI}, define the asymptotic target error, 1], as follows:
1]:=
lim
min 8(8,T) =
111-+00 TE~1
We want to study the robustness of
8 1 with
lim 8(8,8,).
111-+00
respect to violation of R.2. The quantity 1] is the natural
parameter for measuring the severity of the failure of R.2, because:
Proposition 1: R.2 <=> '1=0.
This also justifies the use of lim rather than lim in 1]'S definition, since the latter could yield zero even
if R.2 fails and hence does not adequately quantify the departure from our conditions. We will assess
the robustness of 8 1 by comparing 8(8[t8) to the target 1].
Another quantity which arises naturally in the robustness analysis is:
p(T) := I '\(T n 8)'\CD-'\(T n 8)'\(T) I .
The difference p(8) - p(T) behaves similarly to 8(8, T), as formalized by:
Proposition 1;. For 7
> 0,
o(8,T) < 7 => p(8)-P(T) < 7 •
8(8, T)
> 7 => p(8)-P(T) > (1'·7 > 0,
where (I':=minp(e),'\(~)). It follows that;:
p(8)-p(T) $ o(8,T) $ [P(8)-p(T)]/(I'.
When the partition defined by 8 is "balanced". i.e., (I'=~. we find o(8,T)=2[P(8)-p(T)].
[Proofs of results are given in Section 2.4 below.] When R.2 does hold, then
lim
min {p(8) - p(T)} = O.
111-00 TE~1
However, unlike 8(6,T), the quantity p(8) - p(T) is not a pseudometric;
symmetric in its arguments. Nevertheless, to analyze the robustness of
61
,
10
fact, it is not even
it is useful to define the
analog of 1] for p(8) - p(T):
1]*:=
lim
min {p(8) - p(T)} =
111-+00 Te~1
lim
[p(8) - p(8'l)],
111-+00
where 8'l:=arg min{p(6) - p(T)}. The relationship between 1] and 1]* is given by:
Te~1
Proposition ~ 0 $
,,* $ " $ .,.,./ (1'.
5
Notice that
R.2
¢}
,,=0
,,"=0
¢}
also, when the partition defined by
e
is balanced, the third
inequality becomes an equality.
2.a Robustness Results. Assume now that: F =1= Gj 8 1 is based on a mean-dominant normj
regularity conditions R.l, R.3, RA hold. The main result provides an a.s. bound on
8 1's error that
is
linear in the target error.
Theorem 1:. [Robustness]
where p:=p~(e)+pcl(~) and PF:=
When the partition defined by
Notice that {1'IJ
robustness of
00
00
I I F(x)-G(x) I dF(x), Pc:= I I F(x)-G(x) I dG(x).
-co
-00
e is balanced,
> 0 (by R.l and Lemma 7 of C&K). The upper bounds in Theorem 1 show that the
81
is favorably influenced by two intuitively natural factors:
amount of data in
e versus ~, as "measured by {1';
(i) balance between the
(ii) disparity between the two distributions F and G,
as measured by IJ.
E%tJmple;
Let F be the distribution with point mass at the origin, and let G be the
Un ifo nn[O,1] distribution.
Then direct calculations yield IJF+IJG=~'
Therefore, in the case of a
balanced partition (e,~), we obtain
Finally, we bound the probability of
81
behaving in a non-robust way (as compared to the
r.h.s. of Theorem 1).
Theorem:;' [Bound on Error Probability]
For any 0 < a
for
< 1 and 0 < e ~ e(a),
I I I sufficiently large, where K and K are positive constants.
The probability of non-robustness decreases exponentially as a function of sample size. However, this
6
effect is counterbalanced by the number of candidate boundaries considered (i.e., for fixed target error,
81 when
it is easier to grossly mislead
why the joint growth of
II I
and
I GJ f I
the collection GJ1 is larger). The bound in Theorem 2 explains
must be controlled via R.3. Lastly, notice that the bound does
not depend on the actual target error, Le., it is unaffected by the severity of the R.2 violation.
U
Proofs,
Proof ~ Proposition 1.;, Immediate from the definitions. 0
Proof ~ Proposition
~
Note that
p(e)-p(T) = mini A(§ n T)..\(6) +..\(6 n T)..\(§), ..\(6 n T)A(~)+..\(~ n T)..\(6)}.
Comparing this expression to the definition of o(e,T), the first implication is clear. For the second
implication, write o(e,T)=min{z+z', y+y'} and p(8)-p(T)=min{re (z,z'), re(y,y')}, where
r e(z,z'):=..\(e)z+..\(~)z'. Then,
0(8,T) > 1 ~ z+z' > 1 and y+y' > 1 ~ O'(z+Z'»O'I and O'(y+y'»O'I
The final two assertions of Proposition 2 are now immediate. 0
Proof ~ Proposition!ii. Since e[ , e,/ E GJ 1 , we have by definition and by Proposition 2 that
o :5 p(8) Now take
lim
111-+00
p(e,/)
:5 p(8) - p(e[) :5 0(8, 8[) :5 0(8,8,/) :5 [p(8) - p(e'/)l!O'.
throughout. 0
Proof ~ Theorem 1: For 0 < a
< 1 and 0 < C :5 c(a), we have by Theorem 2 and R.3 that
hence by the Borel-Cantelli Lemma P{0(6,8 f ) > ('7* + c)/O'J.'(l- a) infinitely often [I II] } = O. For
when
I II
;::: i(k,w). Thus, for w ¢ U A k ,
k>l
Proof ~ Theorem
~
Lemma 1: Define
ff *f:={8,
lim
0(8,8 1 ):5 '7*/0'J.'. 0
1/1-+00
A series of Lemmas will be presented, leading to Theorem 2.
~;
- I, Tn
- 8,
- In e,
-T,
Tn §, In §: T E GJ f}' Then:
7
sup
~* I'\(A) -IAldIIII -0 as
A"UI
111-00.
Proof: Follows from RA, using the same argument as in the proof of C&K's Lemma 1. 0
Define the following notation:
'ij ~x):=['\(T n e)F(x)+'\(T n e)G(x))J'\(T), !l ~x):=['\(I n e)F(x)+'\(I n !2)G(x)]/ '\(I),
6E := I 'ij~X{)-!l~X{) I, ~I<T):='\(T)'\(I)SII~6E:
i E I).
Lemma ~ For I I I :::: N1(c),
P{suPT,,'! IDI<T)-~I<T)I >c} ~ K t l'!/lexp{-K e ·c
I
2
·1I1}·
Proof; Follows from Lemma 1 and Dvoretzky, Kiefer & Wolfowitz (1956), using the same argument as
in the proof of C&K's Lemma 2 (with 6=0). 0
Lemma!l.:. We can write ~I<T)=p(T)' SII~6~: i E I).
Proof: Exactly as in the proof of C&K's Lemma 3. 0
Lemma J.;, For every T E '!I' we have ~I<T) ~ ~I<a).
Proof: Follows from Lemma 3, exactly as in the proof of C&K's Lemma 4. 0
Proof: By definition 8'f E '! I is the maximizer of p( . ), and hence, by Lemma 3, it is the maximizer of
~I< . ), over'! I· Then by Lemma 4 we have ~I<a) :::: ~I<a'f) :::: ~I<T) 'v'T E '!I' and by definition we
have DI<8 / ):::: DI<T) 'v'T E '!I. Now,
I ~I<6/)-~I<a) I ~ I ~I<8/)-DI<e/) I + I DI<e/)-~I<a'f) I + I ~I<e'f)-~I<e) I·
(t)
The second modulus on the r.h.s. is bounded by sup T,,'!I I DI<T)-~ I<T)
DI<6 /):::: ~I<a'f):::: ~I<6/) or ~I<8'f):::: DI<8 / ):::: DI<a'f).
I,
because either
The same bound applies to the first
modulus on the r.h.s. of (t). Using Lemmas 4 and 3, and properties (D.e) & (D.d) of S/II ' the third
modulus on the r.h.s. is ~I<8)-~I<8'f) = [p(8)-p(a'f)]SIIP~: i E I) ~ p(8)-p(8'f). Therefore,
I ~I<8/)-~I<8) I ~ 2· sUPT,,'!
For
II I
sufficiently
large,
by
the
I
I DI<T)-~I<T) I + p(8)-p(8'f).
definition
of '1*,
we
have
the
deterministic
bound
p(8)-p(8'f)<'1*+c/3. Now combine this with the probabilistic bound from Lemma 2. 0
To prove Theorem 2, begin by applying Proposition 2, Lemma 4, Lemma 3, and property (D.C)
8
of SIll' obtaining:
8(8,T) > r ::}
where
of:= E id o~/ I I I.
I LlJ<8)-LlJ<T) I = [p(8)-p(T)]SIIP~:
i E 1)
> urfJ 1,
Thus,
p{8(8,e I ) > (TJ*+e)/up.(1 $
an <
P{ I ~J<e)-~AeI)
P{ I LlJ<8)-~J<eI)
I > (TJ*+e)of/ p.(1 - an
I > TJ*+e} + p{of < p.(I- an·
The first probability on the r.h.s. is immediately handled by Lemma 5. The second probability on the
r.h.s. is bounded by P{
lof-p.I > p.a}.
Denote
'6f:= E i~e
o~/ I
e 11 and §.f:= E i~fl fJ~/ I ~ 11 , so
The first and third summands on the r.h.s. are handled using Lemma 1.
summand on the r.h.s. (a similar argument holds for the fourth).
(1963), we have P{
exceeds
I II A(e)/2
l'6f-p.F I > p.a/4}
$ 2· exp{ -ca
I e II}'
Consider the second
By equation (2.3) of Hoeffding
Lemma 1 ensures that
Ie 11 eventually
(say); therefore the bound from Hoeffding (1963) can be combined with the earlier
bound from Lemma 5, for e > 0 sufficiently small (given a). 0
~
EFFICIENT COMPUTATION
To calculate the boundary estimator
61 based
on data {X[: i E 1}, we need to calculate the
criterion function DAT) for each candidate T E «:T l ' The number of simple computations C 1 required
to calculate
e1 is essentially
C I ::::: I «:TIl x #{computations for calculating DAT) on a single T}.
The number of computations for calculating a single DAT) seems on first thought to be of the order
111 2 ,
since we have to calculate
I II
distinct drs, and each dr requires us to find the rank of X{
(i E 1) relative to the subsets {XJ: j E if} and {XJ: JET}. This would yield
C~:::::
Algorithm:
I «:TIl x 1/1 2•
Notice that the calculation of the drs involves ranking the X[s within each
individual candidate region,
if and T. For fixed T, the h~X[) s (1 $ i $ I I I) can be calculated from
an ordered subsequence of the ordered sequence {X[r): 1 $ r $
9
I I I};
similarly for the h~X[) s. Thus
eI is a function only of {R( i), I{ i E T }:
i E I, T E
cr I}'
where R( i) is the rank of X{ in the ordered
sequence. The suggested computational algorithm exploits this structure.
First sort the X{s so that
XlI) $ X(2) $ ... $ X[r) $ ... $ x[ II I -1) $ x[ I I I).
Since
S/If·) is symmetric (property (D.a)], its arguments dJ; may be entered according to rank order (r)
rather than grid location i. The rank of X[r) is r, and its original grid index is stored in INDEX( r).
Now
eI is a
1 $ r $ II
function only of {I{INDEX( r) E T }:
calculation of the vector INDEX( r) together take at most order
I,
II
T E
crI}.
Sorting of the X{sand
I 2 computations.
The collection of ii~x[r»s (and b~X[r»s) is now calculated recursively, as follows.
quantity IT
11 ii~X[I»
The
is 1 if the the smallest observation XlI) comes from T, and 0 if it does not.
Given IT I Iii~x[r» , the quantity IT I Iii~x[r+I»
comes from T, and the same as IT
11 ii~x[r»
-
-1
is one more than
I T IIii~x[r»
if X[r+I)
if it does not. Formally,
I
-
IT I Ih!r(X(I» = I{INDEX(l) E T},
-I I
-I I
IT I Ih!r(X(r+I» = I T I Ih!r(X(r» + I{INDEX(r+1) E T}, for r:::: 1, 2, ...,111-1.
I I
I I b!r(X(r»
=
Also, I I
-I I
r- IT I I h!r(X(r»' Although the symbols I I I I' IT
11' and
I
.
X(r) are used in
the explanation above, the recursive calculations (r.h.s.) depend only on I{INDEX(r) E T}.
The
11 = I T I I ii~x[ I I I»
and
numerical values of
II
II =
II
IT I I
I - IT I I'
and I I
II
are obtained "for free", since IT
Thus, for fixed T E
{I{INDEX( r) E T}: 1 $ r $
I I I}
cr I
, calculation of OAT) requires us to go through
exactly once.
Combining both phases of the algorithm, we have
which is an order of magnitude less than the initial naive approach, i.e.,
Examples: In most applications,
I cr I I
is a function of
I I I.
Table 1 gives the approximate
number of computations Cj for several specific examples. [See C&K for precise descriptions of
each case.]
10
cr I
in
TABLE 1: Number of Simple ComputatioDB.
Boundary Type
#
8,T
Dimension
d
Change-Point
1
I II
Epidemic-Change
1
111 2
Template (Oriented Rectangle)
2
1/1 2
# of Computations
Cj
of Candidates
I~II
4. BOOTSTRAPPING
The boundary estimator
aI
is applicable in a broad range of situations:
restrictions on the distributions F and Gj the class of boundaries (8,
~ I)
there are no
is quite general. In keeping
with this omnibus spirit, it is appropriate to use Efron's (1979) bootstrap for assessing the sampling
variability of
a
l .
Diimbgen (1991) has studied bootstrap procedures for a related class of estimators
in the I-dimensional single change-point case. We now propose a graphical bootstrap procedure that is
particularly useful in the planar (d=2) case.
The true underlying probability structure is determined by the unknown triplet (F, Gj 8).
Having estimated 8 with
aI
, the e.c.d.f.s constructed from the two regions induced by
aI
provide
estimates of F and G:
We now have the estimated probability structure
drawn. Generate
.
from G1 .
{xf:
(F I , GI
j
aI)' from which bootstrap samples can be
FI , and {xt: i E fh}
* .
{Xi: i E I}, calculate 8i via the same
i Eel} as LLd. observations from
From this bootstrap sample
repeating this process, say, B times, we obtain bootstrap realizations
(F I ' GI j
eI)·
as i.i.d. observations
{ail' 8i2' ... , 8ia}
These bootstrap realizations eik are used to model the sampling variability of
11
.
formula as 8 1 , By
eI .
from
Each
ejk determines a partition of the grid
1. If each of these
ejks were a number, we could
look at the variance, percentiles, and histogram of these numbers in order to assess the variability of
el'
But in some situations (e.g., Lipschitz curves [see C&K]), the boundaries are not readily
expressible in terms of a fixed finite set of parameters.
natural to want a direct visual representation of
Moreover, in the 2-dimensional case it is
e/s variability in 9.1
2'
For this purpose we introduce
the (1- 0:)100% bootstrap indifference zone, Z(l_ 0) , which is defined as follows:
B
.
pi := L:1{i E eik}/B, i E I;
=1
Z(l _ 0) := {i E I: 0:/2
:s: pi :s: 1 -
0:/2} ;
Le., Z(l_ 0) excludes those points i E I which either belong to more than (1- 0:/2)100% of the
aikS or
belong to more than (1- 0:/2)100% of the ~iks.
The zone Z(l _ 0) is conditional on the estimated probability structure
CF 1 ' G1 ; e1)'
A large
or wide Z(l _ 0) typically indicates high spatial variability amongst the 8hs, and similarly a nonexistent or narrow Z(l _ 0) means that the 8hs have little spatial variability.
Furthermore, if the
boundary 8 1 lies completely inside Z(l _ 0)' this suggests that the spatial "bias" of 8j (as an
"estimator" of 8 1) is small.
Finally, by analogy, we pass these conclusions about the bootstrap
distribution of 8j (variability and bias) back to the sampling distribution of 8 1 .
In practice, the entire bootstrap procedure described above is implemented on a single set of
data {Xl: i E I}.
Simulations: We simulated the entire bootstrap procedure 100 times for the case of a linear
bisecting boundary [see C&Kj in a 2-dimensional 15 x 15 grid; the true boundary
(0.67,0.00) and (0.40,1.00) in 9.1 2'
e connects the points
For each of these 100 simulations, we obtained first 8 1 ' then a
picture of Z'(.90) , and the proportion (q*) of grid points falling inside Z(.90); the bootstrap procedure
used B=1000. Figures 1.a, 1.b, 1.c show the results for three of the 100 simulations; these were selected
to represent the performance of the bootstrap under "good", "average", and "poor" estimates 8 1 ,
Each figure illustrates 6. 1 , Z *
(.90)' and q *.
As a basis of comparison for judging the bootstrap, we generated 1000 realizations of 8 1 from
the true (F, G; e). The true sampling variability of 8 1 is thus measured by the analogous quantities
12
Z (.90) and q, which are shown in Figure 2 (along with e).
Notice that the bootstrapped zones Zr. 90 ) in Figure 1 are reasonably comparable in shape and
size (q*) to the true Z (.90) and q in Figure 2. Also note that each Zr.90) in Figure 1 contains its
as does Z (.90) contain
e
81 '
in Figure 2. So it seems that the proposed graphical bootstrap procedure is
useful for visually assessing the sampling behavior of 8 1.
FIGURE.li. Simulated. Bootstrap Indifference Zones.
Zr. 90 ) is indicated by:
.....
81
is indicated by:
Figure l.a
(q*=.16)
1
1
1
1
1
. 1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
o0
o0
o0
.
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0 0 0 0 0 0
0 0 0 0 0 0
0 0 000 0
000 0 0 000
000 0 0 000
000 0 0 0 0
000 0 0 0 0
o 0 0 0 000
o0 0 0 0 0
o0 0 0 0 0
000 0 0 0
.
o0 0 0 0 0
00000
o0 0 0 0
o0 0 0 0
Figure I.b
(q*=.28)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
00000
000 0
0 0 0 0
0 000
0 0 0 0
0 000
0 000
0 000
000 0 0 0
000 0 0 0
o0 0 0 0 0
o 000 0 0
. 000 0 0
o 000 0
00000
o0
o0
o0
o0
o0
o0
o0
1
1
1
1
1
13
Figure I.e
(q*=.22)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
o 0 000
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
000
_ 0 0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
FIGURE 2: Simulated Behavior of 8 1-
.....
Z (.90) is indicated by:
e is indicated by:
(q=.21)
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
.0000 000
0 000
0 0 0 0
0 0 0 0
0 000
0 000
0 000
0 0 0 0
0 0 0 0
0 0 0 0
_ 0 000 0 0
o0 0 0 0 0
00000
o 0 000
_ 0 0 0 0
o 000
o0 0 0
o0 0
o0 0
o0 0
o0 0
o0 0
o0
o0
14
REFERENCES
Carlstein, E. (1988), "Nonparametric Change-Point Estimation," Annals of Statistics, 16, 188-197.
Carlstein, E. and Krishnamoorthy, C. (1992), "Boundary Estimation," Journal of the American
Statistical Association, 87, 430-438.
Diimbgen, L. (1991), "The Asymptotic Behavior of Some Nonparametric Change-Point Estimators,"
Annals of Statistics, 19, 1471-1495.
Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956), "Asymptotic Minimax Character of the
Sample Distribution Function and of the Classical Multinomial Estimator," Annals of
Mathematical Statistics, 27, 642-669.
Efron, B. (1979), "Bootstrap Methods: Another Look at the Jackknife," Annals of Statistics, 7, 1-26.
Hoeffding, W. (1963), "Probability Inequalities for Sums of Bounded Random Variables,"
Journal of the American Statistical Association, 58, 13-30.
15